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PREFACE 


Despite all the work that has been done by British psychologists 
in the field of mental testing, it is curious that no textbook has 
been published since Ballard’s pioneering volumes in the early 
twenties. There are many useful but brief accounts, both in books 
for teachers and for psychology students, but these do not attempt 
to cover all the main tests in common use, and they seldom bring 
out the underlying principles of test construction or of the inter- 
pretation of test results. There are also admirable full-scale books 
by American authors. However, these represent a rather different 
attitude to testing and its uses from that current here; and they 
cover far too many tests which have no relevance for us. 

I have tried therefore to supply a book which will meet the 
needs of students of psychology, while at the same time making it 
sufficiently simple to be useful to teachers or others interested in 
education, the social sciences, or generally in assessing, selecting, 
or guiding human beings. I have also referred to British research 
literature, where available, rather than to American, since the 
latter has often been ably. summarised elsewhere. 

A further reason for writing this book is that the views of 
psychologists on the nature of intelligence, and on what intelli- 
gence tests measure, have become modified considerably over the 
past third of a century. There is much that is controversial in this 
field; hence my aim has been to give a balanced view, and to show 
how this is supported by the experimental evidence. Tf, as I shall 
conclude, the intelligence we are measuring is not primarily an 
innate capacity, a good deal of reorientation is required in our 
views on the uses to be made of tests. 

I am indebted to several colleagues for reading and criticising 
the manuscript, notably Dr. C. A. Rogers of the University of 
Southern Rhodesia, Dr. D. M. Lee and Dr. W. H. King of the 
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London Institute of Education, and my sister, Professor M. D. 
Vernon. Acknowledgments are also made to the following for 

- permission to reproduce tables or other material: Mr. J. C. Raven 
(Fig. 7), Houghton, Mifflin Co. (Table VI), Stanford University 
Press (Table XI), and the Controller of H.M. Stationery Office 
(Table X). 


January, 1959 P. E. VERNON 


Chapter One 
HISTORICAL INTRODUCTION 


Tue Press, together with most members of the British public, 
still seem to regard intelligence tests and tests of educational attain- 
ments with a good deal of suspicion. They are thought of as some~ 
thing new-fangled, invented by psychologists, and as much less 
trustworthy or well-tried than the ordinary school examination. 
In fact their history goes back fifty years or more; as early as 1920 
they were fully developed, were being applied to most of their 
present-day purposes, and had been shown to be superior to ex- 
aminations for many such purposes. Many of the criticisms ‘an 
suspicions that one still hears, then, are based on misunderstand- 
ings, which it is the object of this book to dispel. At the same time 
these tests are open to various imperfections and difficulties; and 
these too we will try to describe impartially. 

In the latter half of the nineteenth century, the early psycho- 
logists devoted considerable attention to the measurement of 
sensory capacities. For example, they tested the smallest difference 
in the pitch of sounds, or in the heaving of weights, that people 
could detect. Also they studied reaction time, or quickness of 
response, mechanical memorising, and other rather elementary 
abilities, It was soon found, however, that such abilities bore little 
relation to the intelligence of students or to their educational 
achievements. The first test of higher intellectual abilities was de- 
vised by Ebbinghaus in 1897, and known as the Completion Test. 
This tried to show a child’s capacity to comprehend and relate 
ideas by getting him to fill in missing words in a story: 

Ex.1.One( )( )eagle( )withthe( )birds( )see(_—) 

day the went other to which 
could ( ) ( ) highest. Stay 
fly the 


However, it was the French psychologist, Alfred Binet, who 
irst arrived at the essential principles of mental testing'and who 
produced, in 1905, the first practicable scale for expressing intelli- 
gence in numerical units. 


i 


_ of the children’s intelli 
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THE BİNET-SIMON SCALES 


many years Binet had studied the physical and mental 
a een i his own daughters, and had noted their growth, 
from year to year, in comprehending, reasoning, judgment, and 
other intellectual capacities. Thus when he was approached by the 
Paris education authorities with a request for assistance in dis- 
covering children who were too dull to be educated in ordinary 
schools, he conceived the possibility of a series of mental tasks 


` which would be characteristic of the development of normal 


children of a given age. The average 6-year old, for example, 
was found to be ca 


pable of giving his age, reproducing a sentence 
of 16 syllables, counting 13 pennies correctly, copying a diamond 
shape, and_defiriing ‘horse’, ‘chair’, etc., in terms of use. But a 
retarded child might not achieve the same tasks until he was 7, 8 
or more, and could thus be said to be one or two years behind in 
his intellectual development or Mental Age. Suppose it was found, 
for example, that repetition of 16 syllables was done correctly by 
some 15 per cent of 4-year olds, 40 per cent of. 5-year olds, 73 per 
‘cent of 6-year olds and 85 per cent of 7-year olds: this task was 
then assigned to the 6-year level on Binet’siscale, and children who 
passed this and other 6-year items were 8 have a Mental Age 


of 6, although their actual or chronologi age might be higher 
_ or lower than-6. 


‘Numerous tasks were tried out b 
Simon, as providing samples or spec 
development. They were com 


ca 


chosen which we 


4, ... IO, 12, I5 years and adult. Tra 
English were soon made by, 


i € revision which was 
capable of coverin. valmost the whole r: 


j ange of intelligence from 
3-year to adult levels. This Stanford-Binet scale was the most 


ea. 


a. 


_ and to cease somewhere aroun 


AXi 


mindedness or mental defect (cf. ap 166). We shall discuss later . 
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usefull, and the most widely applied, of all psychological tests for ~ 


the next twenty-one years. 

Binet had observed that children’s responses were somewhat 
irregular, in that they often failed some items at a lower level than 
their Mental Age and passed other items at a higher level. Thus a 


more detailed scoring system ‘was introduced. Supposethe com- , 


plete record of a 5:0-year old to be: 

3-year tests 5/5 passed 

4year n AlS >» 

s-yeat n 2/5 » 

G-year TSi t 

7year n» 0/5» y 
In this case the 3-year level, where all tests are passed, is called th 
basal age. Each subsequent test passed, that is seven in all, gives a 
credit of one-fifth of a year. So the final Mental Age is 3 4H 4 =45 
years, or approximately 4 years 5 months. 


It was also observed that the degree of retardation or advance- 


ment, in Mental Age years, tended to increase as children got 
older. A child who was 1 year backward at 5 would be more 


likely to be 2 years than 1 year backward at 10. The German | 


psychologist, Stern, suggested that the ratio of Mental to Chron- 

ological Age would rem in relatively constant, and Terman de- 

noted this Mental Ratio (multiplied by 100) as the Intelligence 
Quotient or I.Q. ” k 

$ M.A. 

LQ.= CA. X 100 > 

Thus our 5:0-year old with a M.A. of 4? years has an I.Q. of 

88. A child who is just normal for his age has, by definition, an 

LQ. of 100. I.Q.s of 130 to 150 or more are regarded as very 

superior, and those below 70 or so usually indicate feeble- 


the extent to which this I.Q. really remains constant as a child 


grows older. Rye 
With the approach of maturity a complication arises. The 
growth of intelligence seems to slow down soon after 12 years 
: d 14 to 16 years (this statement, too, 
ill require later qualifications). Thus the average adult obtains a 
Binet Mental Age of, say, 15, and if we continued to calculate his 
LQ by the usual formula it would obviously decline progressively, 


Aj 


' 
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i t 0, 
which is absurd. By the age of 30 it would drop from 100 to 5 
. _ 1§ X 100 
ie, ———— 


ear regarding the TS 
ormer, not the latter, represents a c! a i 
- The LQ. is primarily an index of pon hie 

thus, secondarily, ‘a prediction o ten 
s reached maturity on the assump A 
at this rate of growth continues, Consider two children, 
dB: 


A. C.A. 16:0 M.A. 12:0 


LQ. 120 
BECAT: O M.A. 12:6 


+ LQ 114 

sense of being capable of pe 
icult intellectual work at the moment. A, with the A 
in the sense that he is likely to c 


T1-year selection 


T, iye 
examinations), either M.A. or LQ. will giv 
much the same information ab 
levels. i 


» 
» THE WORK op CHARL 
In the early days of testing 


ow far people’s Capacities de 
far th istine 


1 The practice of using L.Q.s and Educati 
selection, rather Me T 


: ental Ages or test scores, į 5 d (c£ 
youngest children in a Yeat-group would be unfairly handicappe' 
Vernon, 1957: Bibliography), 
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that questions of this kind could be investigated scientifically by 


finding the precise amount of correspondence between various 
abilities. j 

The statistical work of Sir Francis Galton and Karl Pearson in the 
nineteenth century had provided an index, known as the correlation 
coefficient, or r, for expressing the correspondence between any 
two sets of measurements of a group of people. This index ranges 
from -r-o (perfect agreement), through 0:0 (no agreement either 
way), to —1°o (negative or inverse agreement). Thus if a class of 
children are ranked in order for problem arithmetic and mechani- 
cal arithmetic, the correlation between the two orders is likely to 
be very close—a coefficient of +-0°8 or over. But if their arithmetic 
and handwriting marks are compared, the correlation may be 
much lower; some good arithmeticians are poor writers, and vice 
versa. But on the whole those good in one subject are also above 
average in other subjects, thus the coefficient is likely to be low 
positive, say --0-30. Arithmetic and age, on the other hand, may 
yield a negative correlation of —o-2 or lower, when—as often hap- 
pens—the brightest children in a school class tend to be the youngest. 

Now Spearman noted that all tests, even of abilities as diverse 
as classics, mathematics, musical ability, weight discrimina- 
tion and teachers’ ratings for ‘cleverness’, correlated positively 
(though to varying extents), suggesting that there was some under- 
lying general ability running through all of them; and this com- 
mon element he designated as g or the factor of general ability. 
At the same time, in so far as the correlations were always well 
below r:o, he suggested that, ey, ability involves, besides g, 
something specific, which he called an s-factor. /Thus musical 
ability and weight discrimination show only a small positive 
correlation because, while both depend partly on g, music also 
depends on a specific musical capacity and weight discrimina- 
tion on a quite independent capacity. Spearman’s work from 1904 
onwards was largely devoted to showing that most of the correla- 
tions between tests of different abilities could be accounted for in 
terms of g and s factors (the Two-Factor Theory).1 He found that 


Ad Particularly clear exposition of the complex arguments and statistical 
calculations underlying the Two-Factor Theory is given by R. Knight (1943). 
Spearman's fullest statement of his methods and results is The Abilities of Man 
(1927). C£. also Vernon (1956) for an clementary, and (1950) for a more ad- 
vanced treatment. 
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tests which involved g to the greatest extent were those yee 
highest intellectual content, for example school marks is c ant 
or tests of solving analogies or classifying similarities. Sa ha 
tended to correlate most highly with teachers’ assessments © Hi 
pupils’ intelligence. But tests like mechanical priben re 
sensory discrimination were more highly specific, or le ase 
saturated’. The essence of g appeared to be the capacity forg añ 
ing relations or, as Spearman put it, for educing relations 
correlates. Thus in the analogy: 


Ex. 2. Leg is to Knee as Arm is to. . . ? 


’ 


ion off 

the person being tested must first comprehend the relagion i 
Leg to Knee, and then apply it to the third term, Arm, in © 
to arrive at the answer, Elbow, n 
Spearman did not, in fact, identify g with intelligence, woa 
as shown in our next chapter—it was almost impossible to ar at 
at an acceptable definition of what intelligence means. By contt set 
gis the highest common factor that can be extracted from any, , 
of tests; it is an objective and mathematically definable quantity, 


. . i of 
However, his theory provided a basis for the construction 
“intelligence tests, Since 


as we shall see later, his Two-Factor Theory has by now bee? 
abandoned, 
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minimised in examinations of certain kinds of attainment such as 
spelling or simple arithmetic sums, where only one right answer 
to each question is possible. Around 1908-10, therefore, Thorn- 
dike, Courtis and other American educational psychologists 
prepared tests of these and other subjects, consisting of sets of 
questions or items which could be scored objectively. Standards 
or norms of performance were provided by applying 'them to 
pupils in a large lumber of schools. Thus the typical or average 
performance for third grade, fourth grade, etc., pupil: was deter- 
mined, and any new pupil could be assessed against these standards, 
Alternatively the scores of successive age-groups were established 
so that, as in the Binet test, results could be expressed in age units. 
A bright 9-year old, for example, might do as well as an average 
11-year old on an arithmetic test, and thus be said to have an 
Arithmetic Age of rr. In this way a meaningful and well-defined 
set of units was substituted for the variable standards of percentage 
or other marks awarded by teachers and examiners. Educational 
Quotients, analogous to Intelligence Quotients, could also be 
calculated. 

This approach to educational measurement was soon extended 
to more complex skills and attainments. Scales were developed 
for assessing the quality of handwriting, and the new-type or 
objective examination was put forward to overcome the defects 
of the conventional essay-type examination. Instead of setting 
studénts a few complex questions, their answers to which in- 

4 evitably led to discrepancies between examiners, each topic was 
“reduced to a number of brief questions so arranged that only one 
tight answer to each was possible. These included: 

(i) the simple question or completion form ' 


A 
Ex. 3. What is the capital city of France? . . a 


4. Napoleon was finally defeated at the battle of . . . in the 
year spe a n 


1 The terms E.Q. and A.Q. are often used to refer to English ahd Arithmetic 
i Quotients, Sometimes, however, E.Q. means Educational Quotient, represent- 
ing the child’s average performance on a complete sct of attainment tests; and 
` A.Q. means Achievement or Accomplishment Quotient, which is derived from 
E. : P 
TE or iS (c£. p: 120). To avoid confusion it would be better to call the 
‘school subject quotients En.Q. and Ar.Q. ¥ 
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(ii) statements to be marked True or False 


5. The author of The Fairie Queene was Christopher i 
Marlowe TRUE FALSE 


- and (üi) the multiple-choice ‘item, together with various more 
complex derivatives (cf. Vernon, 1956). 


6. Sound travels most quickly through: 
A. cold water 
B. hot water 
C. the air 
D. a vacuum 


The new-type form was tapidly adopted in America, not only 
for standardised educatj 


also by schools and colleges 
internal examinations, A stron, 


to an education authority, 
eight 


be shown as a or e i i d 
n : í b xample, his Reading an 3 
Composition Ages, like his Mental Age, meh RE to pi 
Chronological Age, but his Arithmetic Age very much lower. ae 
uca precise comparisons are i ordi fi 
ised school exiniations ip a rm 


¢ 
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group intelligence test. In 1916, A. S. Otis in America was con- 
structing sets of analogies, opposites, and other types of items in- 
volving intellectual comprehension. Each set contained questions 
ranging from easy to difficult, and was given with a time limit 
such that only the most intelligent would answer all of them. As 
already mentioned, Spearman’s statistical approach provided the 
theoretical foundation for combining such sets into a battery of 
sub-tests for measuring g. When America entered World War I 
in 1917, a group of leading psychologists including Otis, Terman, 
Thorndike and Yerkes was able to convince the U.S. Army of 
the likely value of such a group test for measuring the general 
ability level of recruits. Several hundred men could be tested in 
the time previously needed to give one Stanford-Binet test; and 
ae 1% million in all took the Army Alpha or Army Beta tests 

uring the next two years. Army Alpha consisted of eight sub- 
tests, cach timed separately. The following are examples of the 


types of item?: 
Ex. 7. Analogies. Leg is to knee as arm is to: 
(wrist, hand, elbow, shoulder) 


8» Number Series. 2 3% 5 8 12 17 
(Write the next two numbers in the series) 
9. Mixed Sentences. River London is Thames the on. 
. TRUE FALSE 
SAME OPPOSITE 


10. Vocabulary. moist dry 
nii order command SAME OPPOSITE 


on-verbal or pictorial problems, and 


Army Beta was based on n 
ar with the English language, such 


was used for recruits unfamili 


as immigrants. 
Note that the scoring of group tests is objective—a matter of 


totalling the correct responses, Whereas in a Binet test the child 
gives his answers in his own words and the tester assesses their 
adequacy. Army Alpha scores were not usually translated into 
Mental Ages or 1.Qs, but graded arbitrarily on a letter scale from 
A tok (cf. Yoakum and Yerkes, 1920). 

Many interesting findings arose from, this vast-scale experi- 
ment, for example the differences in average scores between 
recruits of different national and racial descent (cf. p. 174). But 

1 Answers to these and later Examples are listed on p. 192. 


T.—2 
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: i ces 
doubt was soon cast on the assum tion that these dite 
represented differences in innate inte gence, when it x ents 
that, the highest scoring groups were also those w g deed 
socio-economic advancement and educational Sen Ahedet 
negroes from some of the northern states of America ia shows 
than whites from some of the southern states. It was a cect 
that the average Army Alpha score for the total body o ee 
was equivalent to a Mental Age of only 13 years. This va a 
to imply that, while educational attainments and Bee we 
knowledge or skills go on increasing till a late age, inte oA 
teaches its maximum during adolescence. Nowadays, howe 
we would ascribe this 


result partly to the fact that the Army tests 
were highly speeded (and abili 


with age), and partly to the low level of education of a large pro 
portion of adults in 1917, 
on such a verbal test. Whe 
in World War I, the re 
equivalent to over 10 points of I, : 
level of education in and the greater familiarity wi 
tests (cf. Tuddenham, 1948). E 
inally, it was found repeatedly that Army Alpha scores co. i 
telated well—to about 0*5—with assessments of the men’s Jate 
proficiency at a variety of Ar 
rdly be expected in view 


Specialised abilities an 


in 1919. Two years latet, 
omson devised the Northumberland Mental Test an 
o{monstate its usefulness in Picking out bright children in rut 
, rmance in ordinary selection examinations w 
dicapped by poverty of background, ill-health or inefficien! 


eS ee 
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schooling. He also surveyed the results of over 13,000 children, 
classifying them by parental occupation (Duff and Thomson, 
1923). Gradually more and more education authorities followed 
suit, and by the 1930s Moray House, Edinburgh, was producing 
a new intelligence test every year for their use. Increasing dis- 
satisfaction with subjectively marked English and arithmetic 
papers in the scholarship examination also led many authorities to 
prefer Moray House, or other similar educational attainment tests. - 


FURTHER DEVELOPMENTS IN TESTING IN BRITAIN 


American Army recruits who obtained very low scores on the 
group tests were often given a version of the Binet-Simon indi- 
vidual scale. But as this was based largely on verbal questions and 
answers, a series of practical performance tests, based chiefly on 
formboards and picture TE, was assembled by Pintner and 
Paterson in 1917 for use with men having language handicaps. 
This Pintner-Paterson scale was adapted by Gaw (1925) in Britain; 
but another eollection, standardised and ublished by Drever and 
Collins in 1928, has commonly been used, e.g. with deaf children 
or others suspected of serious verbal deficiencies. 

Soon after the 1914-1918 war, the National Institute of In- 
dustrial Psychology was founded. Here a procedure for giving 
guidance to adolescents or young adults on suitable careers, based 
on a thorough study of each individual's aptitudes and inclinations | 

y intelligence and other tests and interview, was worked out by 
Burt and his collaborators. The child guidance movement, 
initiated by Burt’s work with delinquent children in London and 
by Drever’s and Boyd’s clinics in Scotland, also expanded in the 
1920s and 1930s. The Stanford-Binet, Burt’s educational tests, and 


Various performance tests were found invaluable in the detailed 


diagnosis of emotionally disturbed or of backward children. 
The scope and application of available tests extended rapidly in 
America, more slowly in Britain. Scales were constructed for 
assessing the developmental level of pre-school children, and pic- 
torial group tests were devised for school children from 5 up who 
could not read the instructions and items of the ordinary verbal 
test. In 1938 Penrose and Raven issued their Progressive Matrices 
- 84), which tests reasoning by means of abstract diagrammatic 
problems from the level of defective adults or 8-year children up 
to superior adult. This was adopted as the basic intelligence test 
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by the Army and Navy personnel selection departments in 1941, 
ad Mk several million recruits before the end of the me 
The greater reliability of tests aiming to cover only a lira 7 
range of intelligence was realised by Richardson with his Simplex 
and Simplex Junior tests, and by R. B. Cattell with his Intelligence 
Scales 0, I, Il, and IN (the last of these being difficult enough to 
extend the university graduate). d 
In 1937, the Stanford-Binet was replaced by the new Stanfor i 
revision of Terman-Merrill test, which provided two paralle 
scales for individual testing from 2-0 years to superior adult Jevels. 
Though generally employed for child guidance work, the Terman- 
Merrill showed certain weaknesses, which are described later; an 
at the time of writing many clinical psychologists are adopting 
other scales devised in America by D. Wechsler. The Wechsler- 


Bellevue Scales Forms I and TI and theWechsler Adult Scale are 


more appropriate for adult testing, and the Wechsler Intelligence 
Scale for Children covers 


the 5-15-year range. All of these yield 
separate verbal and performance test LQ.s (cf. p. 56). The late 
1930s also saw the publication of Schonell’s useful series of reading: 
spelling, arithmetic and other educational tests (Schonell, 1950): 
Several of these are termed ‘diagnostic tests’, since they are ¢¢- 


signed less to provide overall measures of attainment than tO 
reveal the 


In the carly days of the Secon: i 


a . z € 
fc r testing as part of its selection procedur 
or clerical Officers, postal 


1 It should be noted however, that ‘ i than 3° 
seas aes > tests were regarded as aids rather thar 
sole criteria of suitability. Bach recruit also filled in F biographical question 


d ; : i : son (ch 
teeth ad Bae 8 together all the relevant information (€ 


ee eee 
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grade administrators. With the 1944 Education Act it was hoped 
to introduce allocation of all 11-year-old children to appropriate 
types of secondary education, rather than selection merely of the 
‘cream’ for grammar schools. Different local education authori- 
ties vary considerably in the organisation of their secondary 
schools and in their allocation and selection procedures, but there 
are few which do not employ standardised tests at some stage.1 
Indeed, so much research time and effort has been devoted by 
British educational psychologists to improving and validating 
‘r1-plus’ tests, and investigating other controversial aspects of 
selection, that progress has-been slow in other important branches 
of testing. 

Nevertheless the university Education and Psychology Depart- 
ments, the National Institute of Industrial Psychology and other 
investigators have contributed much, by means of tests, to our 
knowledge of child development, of the causal factors in back- 
wardness and in educational or vocational success, of the effects 
of different teaching methods, of the reliability and other statistical 
aspects of school marks and examinations, and of the diversity and 
inter-relations of human abilities. Particularly worthy of mention 
is the Scottish Council for Research in Education which, since 
1930, has published over forty volumes of such enquiries, in- 
cluding the famous surveys of the intelligence of the complete 
T1-year-old Scottish population in 1932 and 1947. The National 
Foundation for Educational Research in England and Wales was 
founded in 1945, and its main interests include the production of 
tests to serve the needs of the education system, and studies of 
secondary school selection and of factors that promote educational 
progress, The Australian Council for Educational Research like- 
wise has a highly efficient test-construction section, and several of 
its publications might well be of use in our own schools. 

DEVELOPMENTS IN TESTING IN THE U.S.A. 

All kinds of testing are so much more highly developed in 
America than here that we can draw attention only to one or two 
of the major trends (cf. Anastasi, 1954). First it is necessary to trace 
further the influence of Spearman’s work on factor analysis. 


ICF. the survey of practices in 1956 by Yates and Pidgeon (1957). For a 


foe discussion of the problems and techniques of selection, see Vernon 
1957). ; 


| other at all tasks of a strongly verbal nature, 
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According to his view, g was almost the only important factor 
underlying people’s abilities, and thus the only one worth 
measuring. This rather narrow conception was criticised by 
Thomson and Burt in Britain, and by Kelley, Thurstone and 
others in America. Burt, for ‘example, showed that additional 
sub-types of ability could be distinguished, such as verbal or 
linguistic, numerical, practical, etc. Thus, given two children with 


the same g or general intelligence, one might be better than the 
suggesting that he is 
superior in a v factor. Such abilities (which Spearman himself be- 
gan to admit in the later 1920s and 1930s) are called group factors, 
since they run through groups of specialised tests. Thurstone went 
even further and, on the basis of his investigations during 1937-42, 

general intelligence, breaking it 
common factors, of which the 
{ (inductive reasoning), D (deductive), N 
(numerical), S (spatial), W (w 
P (perceptual speed). Rather than test a child’s or adult’s global 


| nerican psychologists have continued to pub- 
lish and use general intelligence tests, there has been a noticeable 
: n more specialised tests. At college entrance, for 

Xample, most universities employ a test battery with verbal an 


mathematical sections (sometimes also s atial, in selecting 
engineers). And in addition to phate ae Sea 
Abilities (PMA) tests of his various factors, there are numerous 
series of factor tests and differential aptitude tests, aiming tO 
oaa Roning, Verbal, Numerical, Spatial, Gletical and 
z er abilities. One such battery, for adolescents and adults, has 
een published in this country by Morrisby (1955). While we 
would agree that these could be very useful in theory, if the factors 
Or aptitudes were clearly distinguishable, in practice they are far 
1 British workers have 


n, m (mechanical), k Ge canta! named their factors with small letters, a 
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from satisfactory, partly because the tests do not measure the 
separate abilities with sufficient reliability, partly because they 
overlap so strongly, thus confirming Spearman’s view that g 
(or g + v) is predominant in almost all abilities. When such tests 
are applied to secondary school pupils or college students who are 
taking a variety of courses, it is always the Verbal and Reasoning 
tests that give the highest correlations with scholastic grades; only 
to a slight extent do numerical, spatial or mechanical tests help to 
improve the predictions of mathematical, scientific, technical or » 
other achievement. 

The problem of differentiating abilities along different lines by 
means of objective tests is, therefore, still a very difficult one. 
During World War I, the American Army relied largely on a 
General Clissification Test, which was a composite of verbal, 
numerical and spatial items. Numerous additional mechanical, 
mathematical and other tests were employed to help in allocating 
to special jobs in the Services, and the U.S. Air Force, in particular, 
developed an extensive series of printed tests and practical or 
psychomotor tests (e.g. of hand and cye co-ordination) for select- 
ing air-crew. In the post-war years, however, the Services have 
tried to divide up jobs into a limited number of ‘aptitude areas’. 


For example, the Army distinguishes: 


Combat jobs ge pe 

Electrica] and Electronic erical, and / 
Armaments Maintenance General Technical jobs (Military 
Vehicle Maintenance Police, Cook, etc.) 

ery of 12 varied group tests, two of these 


having been shown to be particularly relevant to cach area and 
less highly correlated with success in other areas. As far as possible, 
then, he is assigned to the area in which he scores most highly. 
Note that this is an ad hoc classification; neither the areas nor the 
tests represent well-defined ability factors. Differentiation also is 
very incomplete: many men score high, and others low, on the 
tests for all areas. Nevertheless it is a relatively effective system 
Which can be applied on a large scale, and which does represent 
some advance over ordinary intelligence testing. 

Despite the partial confirmation of Spearman’s views by later 
Tesearch, we realise now that he was quite wrong 1n claiming that 
the conventional individual or group intelligence test supplies an 


Every recruit takes a batt 


i 


24 INTELLIGENCE AND ATTAINMENT TESTS 


Eduction of several 
Foresight factors 


kinds of relations Adaptive Flexibility 


i Six types of Memory 
ia coe of Fluency Verbal Comprehension 
valuative and Judgment abilities Numerical Facility 
Sensitivity to Problems SecGal OMEGA 
Originality p: 


Visualisation, etc. 


g: the new-type test for measuring 

ee of school or college subjects ey be said to have 

age its highpoint in the Pennsylvania surveys, around 1930; 
y Learned and Wood (1938). The attainments of many thousan 


————— n p 
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of students in different colleges were measured in their second and 
again in their fourth years, and the extraordinary variations, not 
only between different students but also between whole colleges, 
were revealed. For example, one-quarter of the younger groups 
scored higher than the average student who had had two further 
years of schooling. Particularly noteworthy, also in the 1930s, 
were the numerous standardised tests of attainment in almost 
every subject issued by the Co-operative Test Service for use at 
high school and college levels. But criticism of this type of test 
was rising, on the grounds that it encouraged the cramming of 
unrelated facts. R. W. Tyler, E. F. Lindquist and others realised 
the importance of the effects of testing or examining on the 
curriculum and teaching, and did much to show that objective 
test items could be devised which, instead of concentrating on 
factual details, required students to think, organise and apply 
their knowledge (Lindquist, 1951). * 

As early as 1921, Thorndike had included reading compre- 
hension items in his Intelligence Test for College Entrants. 
Candidates were given a series of paragraphs to read, and each 
was followed by several new-type questions designed to show 
whether they could extract information from the material, and 
make inferences and judgments on the basis of their understand- 
ing. Such reading tests often proved to be more predictive of 


Success in college courses than either the conventional group 


intelligence test or the attainment test covering knowledge of a 


Particular subject. To an increasing extent, therefore, during the 
1940s and 1950s, American educational tests have adopted sore 
plex comprehension items. In science, for example, some pro 

lem is described, certain experimental evidence given, ane then 
multiple-choice questions require the candidate to infer so no 
in the light of this evidence and of his scientific training, to n s 
Cate what further experiments are needed, and so foniy Similar’ j 
1M social studies, a political speech or a cartoon may be eae 

and the candidate has to exercise critical judgment in drawing 
Conclusions from the material. It is claimed that, if schools try to 


Coach pupils to perform well at such tests, they will Re ae 
em effectively rather than cramming them with ee le P 
©reover, such tests are likely to be more fair to schools x ies 

Courses in science or social studies are more adventurous, and they 

are less closely linked to any particular syllabus or textbook. The 


$ 


-a 


| given in America to the develo 
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Educational Testing Service at Princeton, and Lindquist at Iowa, 
have contributed principally to this production of tests which 5i ; 
educationally sound as well as statistically reliable and valid. The 
former organisation (which has absorbed the Co-operative Test- 
ing Service) is responsible for much of the selection of podp 
for independent colleges, universities and graduate schools. 
present it tests over a million students a year. : 

It is still open to doubt whether tests of this kind really bring 


out the students’ judgment, or anisation, understanding, etc., as 
Judg g 


effectively as can the essay examination, despite the great ad- 
vantage of their objectivi 


ty of marking. Instead they appear to 
depend largely on general intelligence and vocabulary (or 
VFR factors), and on the students’ facility in coping with these 
complex multiple-choice items (cf. Vernon, 1958). But such 


questions are being actively investigated; and the care which is 


pment of scientific forms of ex- 
with the conservative reliance m 
whose weaknesses are notorious. i 
One other interesting trend is the development of mechanica 
aids. Most large testing organisations, PR many university 
departments which have to examine big groups of students, em- 


ploy machine Scoring. Instead of the pupil, student or other 
foe ticking or underlining the tight answers to multiple- 
choice items, i 


ag ‘ wer 
D Ppropriate space on a special ansv 
sheet. The sheet is fed į machine which summates electric 
e correct black marks, and so yiel 


amination contrasts markedly 
Britain on essay examinations 


at factorial studies of, say, roo'tests can. be 
itish psychologists youd ak 

s may be one reason for 
Americans i ae 


intelligence and educational testing 
have come to Stay. "i 
r 


yf LASN Wy - 


4 Chapter Two 
THE NATURE OF INTELLIGENCE 


; 
Tue word ‘intelligence’ goes back to Aristotle who, as Burt, 


(1955) points out, distinguished orexis—the emotional and moral 
functions, from dianoia—the cognitive and intellectual functions. 
The latter word was translated by Cicero as intellegentia (inter 


within, legere to bring together, choose, discriminate). During the 


past century, however, there has been a great deal of controyersy 
Over its more precise definition, and in particular over the fellow 
ing points, First, is intelligence a unitary faculty of the mind? Are 
our diverse cognitive capacities—perceiving, thinking, imagining, 
learning, recalling, together with special abilities along different 
lines—all functions of this intelligence, or are they relatively or 
completely independent? Secondly, is intelligence inborn or innate, 
inherited from our ancestors, and therefore constant throughout 
life (apart from its natural growth during childhood and eventual 
decline with senescence); or is it partly or even wholly dependent 
on upbringing and education? We still lack any definition of in- 
telligence which is acceptable to the majority of psychologists, and 
which might provide a sound basis for constructing intelligence 
tests. Nevertheless, as will appear below, the answers to both these 
questions have been illumined by scientific investigations with in- 
telligence tests; and in the post-war years there seems to have been 
some rapprochement between previously conflicting approaches and 

e emergence of a broader and more satisfactory theory. 

In 1921 a symposium was published in the American Journal 
of Educa Psychology on “Intelligence and its Measurement‘, in 
which thirteen prominent psychologists put forward thirteen 
largely different views. However, these, and the many other at- 
tempts to define intelligence, can be usefully grouped under three 
main categories, which we may call (a) the biological, (b) the 
Psychological, () the operational. 

| BIOLOGICAL APPROACHES 


Nearly one hundred years ago Charles Darwin and Herbert 
Spencer pointed out that, in the course of evolution, the increas- 


_ ia 
i 
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ing size and complexity of the higher brain centres bad ae 
accompanied by increasing flexibility and complexity otot 
haviour. In lower species we observe relatively Ez ae 
responses to environmental stimuli, ) based on fixe ee 
reflexes or complex instincts, whereas higher species, culmin ae 
in man, are more adaptable and versatile. Thus ingelligence = 
often been defined as ca acity for profiting by experience, adap A 
tion to environment, AEGEA or ability to learn, Spencer a F 
his followers—Lloyd Morgan, McDougall and Binet—thoug t = 
intelligence as an inherited and a general capacity. Both a ae 
evolution of species and in the development of the individua ee 
various more specialised sensory, thinking and other T 2 
progressively differentiate or grow out of this general adap 
ability. ' i ibe thing 
There are, however, numerous difficulties in accepting 

biological con 


ception in its original form, and further aitia 
in applying it to human intelligence in practice. It is obvious, fo 


example, that our current intelligence tests make no attempt to 
measure modifiability or learnin. 


curious that many people whi 
intelligent, and who do quite 


The recent work of comparative psychologists has led to con- 
siderable changes i 


emphasising the im 
shown that many fe 
and universal to th 


an no longer accept a straightforward 
exity of adaptation and mere size oF 
complexity of the brain. Thus spi 


, and ants learn mazes almost 
-as readily’as rats do though far i 
Different species, 


types of adaptation. Within the huma 


teristic that has b 
such as idiots, 
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noticeable abnormalities. There is also the very puzzling fact that 
brain damage, or operations like leucotomy, and even removal of 
whole sections of the brain, often have no permanent effect on 
pedple’s daily life adaptation or their performance at intelligence 
tests. 

Nevertheless a generalisation along the following lines would 
seem acceptable in the light of modern biological and psycho- 
logical knowledge: in lower species, the animal’s behaviour is 
more directly and immediately determined either by its organic 
structure (innate neural and biochemical mechanisms), or by. 
external stimulation to which it becomes conditioned, or both} 
whereas at higher levels, intervening processes occur to a greater 
extent in the central nervous system between the stimulus and the 
response, and these culminate in what we call thinking.’ Thus 
Köhler, in his famous work on apes, did not consider behaviour 
“as being intelligent when human beings or animals attain their 
objective by a direct, unquestionable route, which . . - arises 
naturally out of their organisation”. ‘Insight’, for him, involved ya 
mental reinterpretation or/restructuring of the problem situation.) 
Such a view links up not only with the work of Hebb and Piaget, 
which we shall outline below, but also with practical intelligence 
testing. For the/more complex intellectual problems which n= gi ; 
telligent humans (or rats or apes) can solve are those that involve 
most internal thinking, and which make most use of abstract r 
concepts. ) ‘> 

4 at ey, 
psY¥CHOLOGICAD DEFINITIONS eS 
have been less concerned) with the evolu- 
the animal world than with the particular 


| Many psychologists 

tion of intelligence in oN 

Cognitive functions which are most characteristic © human 
been advanced for a great variety, — 


intelligence. Arguments have ce 


of mental faculties, including planning ability, foresight, origin- 
ality and problem solving. Terman particularly emphasised 


capacity for abstract thinking. Binet frankly rega 
as a complex set of qualities, including: ae By 
(i) the appreciation of a problem and the direction of the min 
towards its execution; (ii) the capacity for making the necessa 
adaptations to reach à ‘definite end; (iii) the power of self- a 


criticis . i + be 
wl that the fundamental quality is “judgment, 


k 


Elsewhere he writes. 
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otherwise called good sense, practical sense, initiative, the faculty 
of adapting oneself to circumstances, To judge wg One 
hend well, to reason well, these are the essential activities o 
intelligence.” é y SINA 
Spearman, as we have seen, interpreted g as “the eduction o 
telations and correlates”, and Knight (1943) expands this into his 
= definition of intelligence 
thinking, directed to l 
writers, he distinguishes this from acquired knowledge or skills. 
The acquisition of thes 


hence most educational tests correlat 


to temperament or ch: 
(Thorndike’s attempt 


y n such unintelligent. mental activities as 
anical thinking, or in uncontrolled fantasy 


7 g feature of these different views is not 
that they sagree but that all tH 


© 
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_ itis entitely legitimate to em 


intelligence as a fluid co 
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electricity can be put to practical uses because we can control and 
measure it, so we can construct and apply intelligence tests pro- 
vided we can demonstrate that they enable us to make certain 
predictions about children and adults. It is questionable, however 
whether we would ever have reached the stage of making useful 
predictions had not Binet and his successors had some psycho- 
logical theories regarding what they were trying to measure. 
Spearman, too, as we have seen, believed that g was operation- 
ally definable—a factor which emerged from analysing the corre- 
lations between tests, regardless of the particular abilities tested or 
the theories on which they were based. And whether or not one 
accepts his notion of g, one cannot gainsay the fact of positive 
overlapping between all abilities. Any satisfactory theory ot 
intelligence must be able to explain why it is that the child or” 
adult who is superior in reasoning problems is also likely to be 
above average, not only in memorising and vocabulary but also 
in arithmetic, mechanical comprehension, and even in hand- 


writing, reaction time and sensory discrimination.t Another 


finding from Spearman’s work which is fully substantiated is the — 
in the sense that the 


existence of a kind of hierarchy of abies in 
‘inctions generally show stronger 
were Tapping—or a greater involvement of his g-factor—than 


ov 
do th functions and sensory-motor 
Capacities, 

Beyond this, however, 
nature of intelligence by statis 


he did not succeed in determining the 
tical analysis. His approach broke 
down, both because the general factor obtained from any battery 
of tests is biased by the kinds of tests used (cf. p. 181), and because — 
phasise—as Thurstone and Guilford 
the generality of abilities. Different 
d different specialised aptitudes are at 
despite their positive overlapping. © 
o 


do—the diversity as well as 
types of mental functions an 
least partially distinguishable, 
hus it is more consistent with the statistical evidence to 
È Jlection of abilities, comprising the whole 
of mental life, though most prominently manifest in higher 
relational thinking. It is a kind of average which cannot be) 
Pinned down to any single mental faculty either by psychological 
OF Statistical analysis; and inevitably it is liable to differ somewhat ~ 


1 Admittedly correlations among the latter tests may sometimes sink to zero 

Or even negative values, but usually only in small and highly selected groups. 
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: a É 

~ according to what different psychologists choose to include 
within it. Probably, therefore, the best definition we can give 1s a 
rather simple, non-specific one, such as ‘all-round thinking 
capacities’, or ‘mental efficiency’ or, as Burt and Ballard suggest, 


general mental ability. ty wee Ln 


THE DEVELOPMENTAL APPROACH 


are really complex acquisitions of early childhood, each having 2 
long history. For example, Hebb casts doubt on the Gestalt 


visual experiences of square 
angles; these have been combi 


f i : he first year or so of life lead to the 
ormation of groupings or assemblies’ of neurones in the associa- 
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tioti areas of the brain. The fully developed perception or idea 
involves an autonomous activity or cerebral discharge within the 
neurones, which he calls a ‘phase sequence’. Note that this is by 
no means the same theory as the old associationist view of lowered 
synaptic resistances or engrams which thought of the brain as an 
inert transmitting system that was merely activated by incoming 
sensory stimuli. Nevertheless, Hebb’s neurological explanations 
are somewhat speculative, and the present writer would prefer 
to substitute the psychological term ‘schema’ for ‘phase-sequence’ 
and to regard it as the fundamental unit of perceiving and 


The chiens as described by F. C. Bartlett and apparently taken 
over by Piaget, is a kind tal pattern or framework in 
which all our past experi t to the percept or concept is 
integrated; so that as each new impression enters consciousness, it 
is charged with all that has gone before. Thus, for example, a 
square box comes to be recognised as such more or less regardless 
of the distance, the angle of viewing or the surrounding back- 
ground. And the world acquires those characteristics of stability, 
constancy and structure to which Gestalt psychologists have drawn 


attention. 


elves intelligible visual perceptions 
the validity of this evidence has been 
on wide and varied visual and 
] or human infant needs to 
gain familiarity with sizes and 
o find out how the 
here of security 
t up as pets with the free run 
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x 
Thus it is highly probable that poverty of early percept a 
perience, or feelings of emotional insecurity, may sens Da Fi 
the child’s whole intellectual growth. Such factors ee cb 
stance, underlie to some extent the differences in inte ger 
tween the middle-class and the slum child. A h Stee 
The early perceptual schemata become organised into E Aa, 
order’ ones—concepts or ideas—which toa still ee brah 
operate as autonomous units in the association areas of ine ai 
Once they are established they become independent of Pa fore 
neurones or brain pathways, according to Hebb, and are ther Thus 
largely unaffected by serious brain injuries or operations. Ne 
mental efficiency in everyday affairs, and correct respons aa 
Binet test items, frequently survive extensive brain Car e aa be 
the capacity for building up entirely new schemata (which ee 
partially tested by so-called ‘concept formation’ tests) does s 
greater signs of deterioration. neh 
One other valuable point made by Hebb is his distinction 
tween Intelligence A~—genetic 
present mental efficiency. The 


= maa bin- 
e central nervous “pe for forming, retaining and recombi 


: ly £t 
apacıty. We would prefer, however, to app t 

to the ability which can be observed in daily life, at “aol Oe d 

work, and which is sampled fairly effectively by our tests; 


Ane} 
i > the great-majoriy r 
psychologists would nowadays agree that the intelligence- 
measure should not c interpreted as pure - inborn ability. 
LANGUAGE AND THE soc} 


ALISATION OF THINKING 
Piaget describes how 


n E- ae f overt 
Mnagery and thinking grow out ot 
trial-and-error behaviour. Even a child Has say, wishing a 


J 
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reach a toy from a high shelf, begins to abbreviate his actions. In- 
stead of reaching, and then getting a chair to raise himself on, he 
carries out the earlier stages mentally and goes straight for a large 
enough chair; in other words, he has acquired a schema which fits 
this type of problem situation. During the second and later years A 
also, language comes to play an increasing part. First it helps in 
the classification and stabilisation of his perceptions. Objects or 
events become identified and sorted out when they are given 
distinct names. Secondly, words provide labels or symbols for 
congepts, that is, for clusters of things or actions. Thus if his 
mother says: “Hot, don’t touch”, he can at once react appropriately 
on the basis of schemata that may have been acquired in some 
quite different context. Thirdly, language is used by society to 
pass on its conceptions of the world to the next generation. 
‘Sport’, for example, describes not merely certain activities and 
institutions, but also an attitude of mind which is quite different 
in England from what it would be in France or Russia or Poly- 
nesia. And fourthly, language has been defined as a set of rules for 
saying things that have not been said before, as when a boy re- 
combines previously acquired linguistic schemata to write a 
school composition or to plan an exploit with his gang. Thought 
and the exercise of intelligence can, of course, occur without the 
use of words, but it is clear that intelligent thinking can develop 
only in a social context, where adults and older children, either 
at Home or at school, help to enrich the child’s stock of percepts 
and concepts and to clarify the relations between them throne 
speech. Moreover, the highest achievements of intellect in the 
creative writer or scientist generally seem to be carried out in 


terms of verbal, numerical or similar symbols. Thus intelligence 
could hardly develop without this tool that the growing child 
acquires from Te ioviety in which he is reared. t i 
At the same time, Piaget insists that intelligence is no one dis- 
tinctive faculty—it cannot be reduced to, say, grasping of relations 
or abstract thinking; it is present in all adaptations of lower 
animals as well as of infants and mature adults. Behaviour be- 
comes progressively more intelligent the more complex the “lines 


of interaction’ between organism and environment, or as Hebb 
Would say—the greater the amount of autonomous cerebral 
activity. In the child’s early years his behaviour consists pre- 
dominantly of concrete and direct reactions to practical experi- 
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ence, and these sensory-motor adaptations tend to be one-way 
and inflexible, i.e. unintelligent. At the higher stages, culminating 
in abstract, logical reasoning, the individual’s thinking is charac- 
terised by mobility—the capacity to transfer to new situations, 
flexibility and reversibility—or ease of manipulation of ideas. 

In between the sensory-motor and the logical stages comes an 
egocentric stage, when the child’s conceptions of space, ame 
causation, morality, etc., are largely irrational, inconsistent an 
intuitive, bound up with his own needs and interests. Even in the 
company of other children or adults he hardly realises, at 5 years 
or so, that other people’s views of the world can differ from his 
own; whereas by 15, even when thinking in solitude, his con- 
ceptions are socially determined and follow a rational pattern 
which he expects to be acceptable to others. The egocentric stage 
is characterised by animism and magical interpretations, sy0- 
cretism (things that happen to occur together are mutually ex- 


planatory) rather than mechanistic causality, arguing by analogy» 
and tolerance of contradictions. 


We should point out, however, that many psychologists dis- 


agree with Piaget’s notion of children passing through definite 
stages of concept development at particular ages. True, he lays 
more stress on the regular succession than on the actual month oF 
year in which the stage appears. But he seems too apt to attribute 
such development to internal maturation—that is, to hereditary, 
factors—although at other times he admits the importance ©. 
contacts with the real world and people in fashioning logica 
g. Further, he neglects the tremendous variations in level © 
development among different children at the same age. Indeed, 
subsequent investigations have shown, as we might have expecte 
that the stages are more closely associated with Mental Age than 
with Chronological Age. But over and above this there are great 
Variations among different kinds of ideas and different thinking, 
situations; that is, egocentrism is not so much a natural stage of 
stage in the development of each class of ideas. It 
S J i adults (even those of superior 
education and intelligence) when they are thinking about 


economic, political, social and religious matters, or bringing up © 
children, or about personal relations with other people. Much the 
same illogicalities and prejudices that Piaget attributed to child- 
hood are described, among adults, in Thouless’s book Straight at 
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Crooked Thinking (1930). The average adult’s concepts and think- 
ing attain a fair degree of rationality when emotions are not in- 
volved, or when the exigences of the physical world have forced 
him to be objective—as, for example, in dealing with temporal 
and spatial problems involved in getting to his job on time. But 
equally, children much younger than 5 seem to reach the rational 
stage in some of their practical activities, say in block-building or 
in manipulating furniture to reach a desired object. : 

This point is an important one since the traditional distinction 
between cognitive and affective aspects of mental life has often 
been carried too far. Intellectual conceptions and reasoning do not 
develop in isolation, as it were. Yet in trying to measure intelli- 
gence we are artificially abstracting intellectual competence from 
the context of sentiments and complexes in which it normally 
manifests itself, However, there is some justification for this in that 
our thinking about non-controversial matters usually is logical, 
and intelligence tests can be devised within this area and. thus 
avoid egocentric thinking. We shall see later that, among normal 
children and adults, their results seem to be remarkably little 
affected by motivation or emotion. Even the neurotic adult does 
score lower than normal, though at the same time 
intelligence does generally correlate positively with superior 

aracter traits, There is some evidence, also, that work-attitude 
factors such as persistence and carefulness have a more marked 
effect on test performance when the test items are either very 
difficult or very easy (cf. p. 182). It is among pre-school children 
and psychotic adults that the influence of emotional dispositions 
on intellectual functioning is most obvious; and in such cases it 
is essential that any testing be conducted by a skilled child 
Psychologist or clinical psychologist who can, to some extent, 
allow for and control these influences. 


Not, on average, 


THE PSYCHOLOGY OF THINKING 


We have already stressed the importance of concepts and rules 
which can be transferred to a wide variety of situations. While 
mental development certainly includes the acquisition of infor- 
mation, at least as necessary is the acquisition of methods of 
tackling the sorts of problems people mect in daily life, or at 
school, or in intelligence tests. Harlow (1949) has demonstrated 

€ same principle in monkeys: given a series of discrimination 
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<q: : ; d. 
roblems, their ability to solve them progressively improve 
They had “learned He to learn”. “This learning to learn trans- 
forms the organism from a creature that adapts to a changing 
environment by trial and error to one that adapts by seeming 

“hypothesis and insicht.” 

is ere this is i the whole story. In his recent book (1958), 
F. C. Bartlett brings out the variety and complexity of thinking 
processes. He shows that, while the term ‘thinking’ can be ee 
plied to almost any mental process which is not merely a res Piin 
to external stimuli, it is characterised in particular by the ‘ft a 
in of gaps’. That is to say, the thinker takes in contributory sources 
of evidence and combines these with recalled information, But ii 
order to reach a solution he has to antorpolata or extrapolate from 
the evidence, or else reinterpret it and sce it in a new light th t 
Köhler’s account of ‘insight’), Bartlett draws attention to the 
importance of transferring ar 
generalisations that have bee 
points out that this is also 
erroneous thinking. Indeed, 
contributory evidence than 
out of” inappropriate techni 
about human affairs 


: i for thinking than the 20-year old, and yet 
be relatively less succ PARE them to unfamiliar reason- 
ect of intelligence is also affecte 
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by personality qualities such as authoritarianism vs. tolerance, and 
is thus dependent on the success or failure of early upbringing and 
of ugg in encouraging initiative, originality and rationality 
of ideas. 

Bartlett also offers a useful classification of different species of » 
thinking. Thus he distinguishes thinking in closed systems, where 
the correct solution is predetermined and the individual has to use 
the evidence to find the right steps for filling in a limited number 
of gaps, from adventurous thinking, where there is freedom of 
method and the thinker cannot know whether his solution is the 
direct or final one. Tn the latter category he describes the 
differences between everyday, scientific, artistic ad other kinds 
of thought. Intelligence test items, even of the individually 
administered Binet type, necessarily belong to the former cate- 
gory, and therefore omit such aspects of thinking as sensitivity to 
problems, imaginativeness, and what we call ‘wisdom! or good 
judgment. However, it is noteworthy that all these qualities have 
been studied by Guilford (cf. p. 24), though it is not yet clear 
how far his factors are really distinctive, or representative of 


thinking in real-life situations. 
INTELLIGENCE AND ATTAINMENTS 


We have arrived at the view that intelligence corresponds to 
the general level of complexity and flexibility of a person’s 
schemata, which have been built up cumulatively in the course of 


his lifetime. It would follow that no sharp distinction should be 
beteen inte ence and attainments; nor should we think 


drawn between intellig c we 
of the former as one of the main causal factors in determining the 
latter. Both are dependent on, or limited by, genetic factors, that 
is Intelligence A; and it is probable that various specialised attain- 
ments involve particular genes in addition—e.g. those underlying 
musical or numerical talents. But there is no essential difference 
between the acquisition of, say, reading skills and the acquisition 
of reasoning or other capacities which would be conventionally 
teparded as part of intelligence. Both involve the development of 
schemata through exercise with appropriate materials, and their 
shaping or correcting by environmental pressures. 

_We have been too apt to think of attainments as wholly learned 
and dependent mainly on goodness of schooling. Many investiga- 
tions have shown that the amount of time given to teaching a 
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subject in different schools has little connection with the amount 
acquired. True, the use of improved educational techniques ro 
usually be shown experimentally to bring about greater gains, is 
the contribution they make is always a small one relative to the 
total range of variations in attainment. We recognise the es 
tance of genetic factors and of general developmental level when 
we talk of ‘reading readiness’. Similarly, in arithmetic the laming 
of multiplication tables does not consist merely in the accretion © 
bits of information through repeated drilling. Certainly practice 
is necessary, as it is in the development of our percepts and cor 
cepts and techniques of thinking. But for progress to occur itis 
equally necessary for children to explore and discover relations 
among numbers, in and out of school. A 
Nevertheless, a relative distinction still seems useful. peg 
ments refer to mental knowledge, skills and understandings whic d 
are more directly channelled by the content of the curriculum a 
the kind of training the school (or other instruction) provides, an 
which probably depend to a greater extent on interests and on 
personality qualities such as industry. On the other hand, intelli- 
gence refers to the more generalised thinking functions—all- 
round conceptual development, techniques of analysing, com- 
prehending, organising, learning and problem-solving which—as 
Ferguson (1954) points out—have crystallised out of the child’s 
Previous experience in or out of school, and which can be trans- 
ferred to a wider variety of new situations. p 
are intermediate and therefore difficult to classify 
ing. Vocabulary, for example, often turns UP 
igence or in English attainment tests; general in- 
uded in the Army Alpha test; and problem 
$ wn to be one of the best tests of Thurstone’s 
inductive reasoning factor. As pointed out above (p. 25), many 
American educational psychologists have abandoned both in- 
telligence and ordinary attainment tests at the senior high schoo 
and college levels, and prefer to try to measure what they ca 
developed abilities’, such as reasoning with mathematical, 
guistic, scientific or social studies materials. The layman migbt 


suppose that breadth of vocabulary depends mainly on what 


children are taught by their parents and school teachers; am 
indeed, children fake 


c middle-class, and particularly from pro- 
fessional, homes, where they hear richer and more accurate 
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speech, do tend to show better vocabulary than children from 
working-class homes. But in fact neither example nor instruction 
is likely to enable them to define and use difficult words correctly 
unless they have reached a sufficiently advanced stage of mental 
development to understand the underlying concepts. 


CONCLUSION 


We must now try to link up our developmental psychological 
theory with the findings of factor analysis. Although cognitive 
functions are almost infinitely varied, g can be regarded 1 as the 
highest common factor among, or as the overall efficiency of, 
schemata. This view is very similar to that of G. H. Thomson 
(1939), one of Spearman’s earliest critics. He condemned the 
tendency to regard statistically derived factors as unitary powers 
or organs of the mind, and instead thought of the mind as con- 
sisting of an immense number of bonds or associations. Any one 
mental test would involve the operation of many such bonds, and 
two or more tests would tend to correlate because they draw on 
We can now see several reasons for expect- 
g among all cognitive tests. First there is 
the human nervous system which makes 
men more capable than lower animals of acquiring, breaking up 
and recombining habits, percepts, concepts and schemata of all 
kinds; and some individuals have greater genetic potentiality 
than others. 

Secondly, the essentially cumulative n 
implies that those who early in life acquire a larger stock of per- 
ceptual schemata and verbal habits, are better able to build up 
More complex and more flexible schemata necessary for con- 
ceptual thinking. Moreover, the later-developed, higher-order 


ones will naturally be more inclusive or more g-saturated, whereas 
the earlier sensory and motor ones will appear relatively differ- 


entiated and specific. Thirdly, some individuals are reared ina 
ticher, more stimulating and more emotionally adjusting en- 
vironment than others, and this contributes throughout child- 
hood, adolescence and early adulthood to the abilities they mani- 
fest in almost any kind of test. Others are relatively starved or 
€motionally frustrated, so that they fail to throw off the emotional 
components of primitive thinking; or their schemata become set 
and rigid, thus inhibiting the acquisition of new ones; and their 


the same total ‘pool’. 
ing positive overlappin 
some innate quality of 


ature of mental growth 
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overall intellectual efficiency begins to decline, perhaps even be- 
fore they have grown to full maturity. 
Finally, it should be noted that g or Intelligence B should not be 
‘regarded as something constant in composition throughout the 
life span. Bayley’s (1940) detailed studies of o-3-year-old children 
show us a multiplicity of functions growing more or less inde- 
pendently out of previously maturing functions. Through the 
interplay of schemata and through the pervasive influence of 
guage, a considerable degree of unification emerges. But 
children’s activities, experiences and interests vary so much that 
different kinds of ideas and different skills often show very une 
even development. We would agree, too, with Piaget’s notion 
that the characteristic thinking processes alter greatly in quality, 
as well as in overall efficiency, as children get older. Such special- 
isation and differentiation naturally continue into adulthood; 
indeed, the whole conception of a ‘general’ intelligence becomes 
less applicable among adults, since their maximum capacities are 
expressed mainly in diverse vocational and social skills and in 
leisure pursuits. Hence ordina 


ep ry intelligence tests, based rather on 
linguistic concept development and abstract reasoning, provide a 
less representative sampling of their schemata or, in other words. 
tell us less about an adult tha 


n they, do about a child. 


Chapter Three 
ELEMENTARY PRINCIPLES OF TESTING? 


Tue view has often been expressed—by philosophers and some 
_ psychologists, as well as by Iaymen—that human qualities are in 
their very nature immeasurable. True, one cannot lay a ruler 
alongside a child’s intelligence or perseverance or musical apti- 
tude. Yet the notion of more or less is continually employed in 
human affairs. Every mother rejoices at her 3- or 4-year old’s 
progress in vocabulary, and observes that he is better or worse 
behaved than her neighbour’s children. An Army sergeant is more 
proficient or experienced than a corporal. School examinations 
may have their undesirable features, but passes and fails, percentages 
and other marks and degree. classes have become part and parcel 
of the educational system. The psychometrist—that is the psych- 
ologist interested in mental measurement—is simply trying to 
turn these relatively vague quantities into purer and more precise 
ones. Many mental tests are, in fact, very similar to examinations, 
though certain precautions are taken to ensure greater accuracy. 
Others more closely resemble the sort of tasks or situations which 
we use informally when judging people's traits and abilities in 
everyday life. For example, a child’s capacity for making a con- 
struction with bricks or blocks, or the extent of his vocabulary, 
have been directly adapted as patt-tests of intelligence in the 
Terman, Wechsler and other aaa Let us, however, review the 
chief characteristics that distinguish the mental test from the un- 
standardised situations of daily life, or from school examining. 


STANDARDISATION? OF THE TEST SITUATION 
AND SCORING 

It is essential that the test as a whole, and its component items, 
together with their manner of application, should be uniform for 

1 Fuller treatment of this topic may be found in the present writer's The 
Measurement of Abilities (5956). A helpful account of the different pan of 
measurement, some of which are, and others are not, applicable in psychology, 
is given by Banks and Burt (1953): , * 

2 The eee ‘standardisation’ sometimes implies that test norms or standards 
ate provided; here, however, we are using it to describe uniformity in, and 
control over, the conditions of application of a test and its scoring. 
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all “testees’ (that is, persons taking the test). Inexperienced bi 
sometimes think that minor alterations do not matter. a e i 
ample, they may ‘translate’ an American test or add to ec 
structions in the hope of making things clearer, or not bo the: Be, 
adhere strictly to the time limits. But all these changes in 3 
make the test a new one to which the original norms (cf. p. 4 h 
probably do not apply. All reputable tests are published we 
manuals giving instructions for their application which mus 
followed precisely. ey 
The cote Benes or failure at each item in the test is i 
wise standardised or objective, in the sense that the personal ju 5- 
ment of the tester is kept to a minimum. This is in marked con 
trast to the typical school examination. As described in chapes 
One, the group intelligence or educational test generally consa 
of multiple-choice or new-type items, so that the testee s o E 
simply the number of items in which the correct answer is picke 
from the alternatives provided. Indeed, scoring can often be done 
by machine. af oat 
In some scholastic tests, such as arithmetic and spelling, ie 
child may write his own answers, since these can be scored right 
or wrong. Partial credits are not given, as these would inte 
subjective judgment. There are a few group intelligence tests also 
which use creative Tesponses. Quantitative scores on practica 
performance tests are usually obtained by recording the number 


of blocks or pieces of a puzzle put together, or the number © 
gtaded problems correct! 


of seconds needed to complete a task. 

However, in most individual Binet- 
quite unrestricted, and here the manu 
Tesponses are acceptable or unacceptable. For example, the answer 
to a Stanford-Binet 8-year comprehension item: “What’s the 
thing for you to do when you have broken something whi ; 
belongs to someone else?” must imply restitution or apology: 
shame or confession are not sufficient by themselves. 

It should hardly be Necessary to point out that these or other 
scoring instructions should be followed implicitly. Yet, in the days 
before doctors were trained to apply Terman-Merrill properly, 
children were sometimes wrongly certified as defective throug 
bad scoring. And itis still possible to fail ‘the i1-plus’ through mis- 


1 For instance: Ex, 8 (p. 17), Ex. 13 (p. 76), Ex. 43 (p. 82). 


type tests the responses a 
al lays down exactly wha 


$ a; à T 
y solved in a given time, or the number _ 
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scoring a Moray House group test (e.g. missing out a couple of ` 


pages), though this is improbable since careful checking is gener- 
ally applied. 

At the same time, the objective conditions of giving and scoring 
are not everything. Both in individual and group testing, the good 
tester tries to ensure beforehand that each child is well motivated 
to do his best, avoids distracting stimuli and encourages fre- 


quently throughout. 


TEST RELIABILITY AND VALIDITY 


The test must be sufficiently extensive, and its component items 
well chosen, to yield a trustworthy or stable total score. Psych- 
ologists use the term ‘reliable’ if the test gives the same, or nearly 
the same, results when applied on two or more occasions or by 
two different testers; or if a closely parallel test is concordant (cf. 
Cronbach, 1949). No mental test is perfectly reliable in this sense, 
since human beings naturally vary in their responses on different 
occasions, or to different, though parallel, sets of items. The 
ordinary school essay or examination 1s particularly unreliable, 
not only because the marks are dependent on the examiner's 
subjective judgment, but also because pupils write better on some 


topics than on others. 


Most published tests consist of items which have been tried out 


beforehand, both in order to ensure that the desired range of 
difficulty is covered and to see that each one gives results consistent 
with the results of the test as a whole. Items that fail to discriminate 
“good from poor scorers probably contain unsuspected ambiguities, 
and they are either revised or eliminated. We cannot discuss here 
the detailed techniques of test construction, but we should note 
that the initial formulation of items, the trials and revisions and 
the eventual standardisation of a test are extremely intricate (cf. 
Anstey, 1948; Vernon, 1948). It is said to take some three years to 
produce a Moray House test. This, of course, is one reason why 
tests usually work better than conventional school examinations. 
An account of the factors that influence a test’s unreliability is 
given in Chapter Seven. à 
Note that reliability does not mean the accuracy of the test in 
measuring the ability it is*supposed to measure, only the de- 
pendability of the instrument itself, The accuracy or value of the 
test is called its validity. Thus a child’s height can be measured 


á 
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1 . So . . lidity as a 
i st perfect reliability, but it has very little valid 
meee hs mental REOR and only a moderate validity rie 
index of his overall physical development. Clearly a test a se 
different validities for different purposes, and there are 
stablishing these. A ’ 
area and BPW of the content of the test is Maen 
sufficient. For example, it is obvious that a spelling test pre pe 
of sample words of the kind that children need to use, ara 
quently mis-spell, has validity. Yet even here we know ae 
experimental researches that dictation tests of sample Sees 
not measure quite the same ability as is involved when c j; 
write compositions with few, or many, spelling errors (p. M ; 
also that a multiple-choice test where children have to under is 
the correctly spelled version among several mis-spelled versio 


of the same word has almost as high a validity as does a dictation- 
type test, although—to mere 


inspection—it might appear he 
siderably less valid, Introspections from testees regarding Fe 
processes they use in answering a test also sometimes provi 
useful evidence, 


en. Numerous te: 
to involve, say, manual dexterity 
quite useless. Again it is dubiou: 
tests (p. 25) o comprehension of aragraphs, designed to evoke 
Judgment, logical inference, critical think; g and so forth achieve 

err aim (cf, Vernon, 1958). Particularly in the case of the in- 
telligence test neither the opinion of the psychologist who devises 

€ test, nor that of the testees, can decide whether the items arë 
valid or not, The judgments of some critics, to the effect that 

oray House tests measure ‘slickness’ or other such qualities, not 
genuine intelligence, are entirely worthless in the absence © 
Experimental evidence 


approach, too, is less 
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the qualities the test is supposed to measure. Thus no one has 
ever been able to validate university Arts degrees, since there is no 
e ultimate objective of an Arts course and no 


agreement as to th 
means of ascertaining which graduates have been successful or 


unsuccessful subsequently in this respect. Often, too, the correla- 
tion between a test and a criterion is less important than the fact 
that the test adds something to already available data, or helps to 
differentiate the testees’ suitability for two or more different 
outcomes, Thus a technical aptitude test which gave the same 
predictions as intelligence and arithmetic tests at 11-plus, or which 
correlated as highly with grammar school as with technical school 
performance, would be of little value (cf. Chapter Seven). 

Here, too, intelligence tests are specially tricky because of the 
lack of any definitive criterion. Teachers’ judgments of children 
have sometimes been used, but are unsatisfactory since they do 
not readily distinguish intelligence from scholastic success or from 
good classroom behaviour. Indeed, the main object of such tests 
is to provide a means of assessment which is independent of 
subjective judgment. Although Binet, in constructing his orig 
scales, did in fact attempt to choose tasks which differentiated 


children called intelligent by their teachers from those called dull, 
he also aimed—as we have seen—at items which appeared repre- 
sentative of children’s higher mental development, and ensure 


that each one was passed by greater proportions of older than of” 
i insufficient; for 


younger children. Nevertheless these criteria are insuflich f 
example, they might be met by asking “What is 7 X 9°, which 
would hardly be accepted by most people as a good test of 


ana ch in such circumstances is known 
e most appropriate approach 1n st V 
as Construct Plidation. ane is implied that the psychologist 
designs a test to measure 2 certain hypothetical quality or con- 
struct’ and, in the absence of satisfactory external validation, seeks 
indirect evidence that the theory underlying his test is sound. 
Thus Spearman put forward the g factor as a theory to explain the 
correlations among varied cognitive tests, and was able to show 
by factor analysis that tests involving abstraction and other higher 
intellectual processes were highly d with g. The later 
developments that we have described aes to P 
in <- is still the chief tool empio 

the theory, but factor analysis is s genre ake? 


STS 
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test a 
ose, now, that someone proposes to ir, 
ENO E aa he would first make up eaa to 
h ee to involve this faculty, and would S eue 
show that these correlated among themselves, and prov ae ould 
did measure something in common over and aboye Pra: The 
be attributable to gor Vor Ror other well-establis e ee rA 
results should also correlate with teachers eyes be weer 
students’ judgment, though this evidence woul diane ite 
convincing because of the difficulties of defining ar ca. eee 
quality. It might be more useful if students taught 2 higher 
specifically aimed to improve their judgment showe 


> ntiona 
gains in test scores than those taught in more conve 


courses. 


2 lex 
It will be scen, then, that the validity of a test is a comp 
conception which seldom admits of a single clear answer. 


TEST NORMS 


De W ee 
Neither the number of items a child accomplishes in an 
tellige 


5 amination 
nce or attainment test, nor other scores or exami 
marks, have any meaning in thems 
compared with s 


: o 
representative age group. And so the system 
expressing abilities in terms of 


Mental or Educational Ages E 
uotients was invented by Binet, Thorndike and Terman ot 
should be noted © Not attempt to set a standar in 
> Sut simply represent what children L 
general do do under Present conditions of upbringing and schoo. ; 
1 » FOr example, an average 10-year old will obtain 
Word Reading Age li 


peel but cannot wri 


k é in 
ties numerically, it has aye 
adolescents and adults. 
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nearly as possible) the top 10 per cent of a representative group of 
young adults from the bottom 90 per cent. The soth percentile 
is the mark halfway down the list, often called the median. If we 
are told a series of percentiles for a test such as the 98th, goth, 
7sth, soth, 25th, A and 2nd, we can readily interpret whether 
any person’s original or ‘raw’ score represents a good, average or 
weak performance. In effect, then, percentiles provide the same 
information as would I.Q.s or E.Q.s ranging from about 130, 
through 100 down to 70, Or the same as the number of years 


by which a child’s M.A. or E.A. exceeds or falls short of his 


CA: 
ly convenient when it is difficult to 


This system is particular! j 
secure complete age prone standardisation purposes. Indeed, 


it is often more useful to know the percentiles for, say, 14-year 
olds in secondary modern schools and the (different) percentiles 
for 14-year olds in typical grammar schools, than it is to have 
norms for 14-year olds in general. 

The National Institute of Industrial Psychology issues per- 
centile norms in different types of school for most of its group 
tests. Again, the British Army standardises its tests mainly in terms 
of percentiles for unselected recruits or for officer-candidate 
populations. Scores at the goth percentile or above are referred to 
as S.G. (Selection Grade) I; goth to 7oth as S.G. I; 7oth to soth as 
HMI- ; soth to 30th as M— ; 30th to roth as IV; and scores obtained 

y the bottom ro per cent as S.G.V. By this coarse, yet quite 
effective and easily intelligible, method a recruit s results on a 
series of diverse tests all becomes comparable. For example, a 
man with S,G. I or II on verbal, clerical and arithmetic tests, but 
S.G. IV on mechanical tests, would be better allocated to a 
clerical than a ractical-mechanical job. Had we, however, 
merely known his ‘raw’ test scores, NO such comparison wo d be 
possible. And the same is true of pupils’ and students’ examination 
marks in different subjects, since these are seldom standardised in 
any way; they merely reflect the particular examiners’ opinions 
of what levels of ability should be labelled 70 per cent or 50 per 
cent, or 1st class or fail, etc. 


When percentiles can be obtained from representative age 


PES i ivalent 
Stoups, it is ible to convert them directly into equiva 

LQs or EQS: if aie without the intervention of Mental and 
Educational Ages; and this is roughly the procedure adopted in 


LT.—4 
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the various Wechsler scales and the Moray House, National 
Foundation for Educational Research and other madem jan 
intelligence and attainments, For example, in standardising T 
at the r1-plus level, it is given to a large, representative Eren 
aged 10:0 to 12:0, subdivided into month-groups Goro Er 
10:2, etc.). Certain percentiles are extracted from. each se hat 
group, plotted on a graph and smoothed. Now it is monna 
approximately the top 24 per cent of children norma. xe ag 
1.Q.s of 130 upwards; the top 9 per cent obtain 120 upwards, = 
so on all down the scale. Thus the 974 percentile line or ery 7 
the graph gives the 130 LQ. (or E.Q.) norm for children o we 
age from 10 to 12, and the 91st percentile curve yields the 120 Hen 
norm, and so on. A conversion table is then drawn up from w. h 
may be read off the quotient corresponding to each score at i 
age. Actually, however, these are not quotients at all, since at a 
stage are M.A.s or E.A.s involved. As the authors of the e 
point out, they are standard or sigma scores, to use the Beno S 
term. Alternatively they are referred to as deviation I.Q.s, to dis 

i cal I.Q.s. Nevertheless, in secondary 
usually referred to as inte gence ies 
quotients and are at least as meaningful a 


they are 
English or arithmetic) 
such quotients, 


it is desired to express adult test results in terms of quotients, 
this is the onl y of doing so, Strictly speaking, n° 
mental test can yield a M.A, i 


por Tmance of persons aged over 15 tbat: 
j n that of 15-year olds. The Term 

Merrill, and some other tests, are actually scored up to higher 
ages, and these are used to calculate quotients. But such units for 


Tain other types of test norms are occasionally used. Fot 
example, American and Australian educational tests frequently 
quote Grade, instead of Age, norms. Thus a Grade-score of 4°5 
1 This is explained more fully in Cha ter Seven (cf. Fig. 6, p. 108). Any m 
score can be converted to a sig y expressing it as a deviation from a 
Tone Cones BY the Standard Dope standard score provides 4 

ansformation by multiplying sigma scores by some co” 
€, €g. 15, and taking a Conventional mean, e.g. 100. 
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means that the pupil scores halfway between representative 4th 
Grade and sth Grade pupils. It would be equivalent to an Educa- 
tional Age (by American standards) of approximately 10:0 years. 
But the great majority of modern tests employ the percentile, or 
standard score, systems or some derivative thereof. 


Chapter Four 
INDIVIDUAL INTELLIGENCE TESTS 


THE TERMAN-MERRILL SCALE 


TERMAN and Merrill’s 1937 Revision of the Stanford-Binet scale 
is the most extensive, and the most convenient, test for peru? 
the intelligence of individual children. The two forms, L E d 
are both applicable from about 2 years up to superior adult te 
(Mental Ages 1:7 to 22:10), though they show considera È 
weaknesses at the top end. Form M is generally reserved for i 

testing. The items are grouped under successive age-levels (usua y 
halfa dozen at each level): 2, 24, 3, 3, 4, 44, then by years from 


5 to 14, and finally four increasingly difficult adult levels. Thus at 


the pre-school level each item carries one month’s credit of Mee 
Age, from 5 to 14 two months each, and at the top end up to 
months each, As explained in Chapter One, no testee is given the 


whole scale, only those levels appropriate to his ability. First the 
tester finds the 


year level at which all six items are done correctly 
(the basal M.A.); then testing is continued until a level is reache 
at which none, or only one, can be answered. Testing normally 
occupies anything from 30 minutes for a co-operative 7-year O 
to 75 minutes for a resentful adolescent; but the time also depends 
largély on the experience of the tester and his familiarity with the 
aeectons and scoring, which he should know practically by 
eart, 
At each level, four of the six items are starred for use as an abbre- 
lated scale. But this 


; atiguing the child. Thus the Picture Vocabulary 
test provides a sc 


numbers of pictures identified; s 
graded Sar 


ests (such as Identification. by Use, 
ore positions in the scale, w1 
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different passing standards, and can thus be scored for all positions 
at one time. Many other items have three or more parts of which 
only two, say, have to be done correctly for a credit; thus it is sel- 
dom necessary to give the third part. Some testers jump about the 
scale; for example, giving all the Digit Memory, or other related “ 
groups of items, at one go. This has the additional advantage that 
all the most difficult and fatiguing items do not need to be applied 
at the end. Terman discourages this practice; however, it has been 
shown to have no effect on the I.Q.s of normal children, though 
it helps unstable ones slightly. We would therefore consider it 
legitimate. 

For M.A.s of 4} upwards, 
Measuring Intelligence (1937), 2 set 0 t L 
some special beads,! also some blocks, pennies, paper, pencil, 
scissors and a watch with a seconds hand. For lower M.A.s, an 
bjects and other practical materials 


is needed Though the instructions for giving and scoring each 
i this test than for any other, it 


is particularly necessary that the tester should be thoroughly 
trained and experienced in their application. i 
The published British version of the scale has been partially, 
but incompletely, adapted from the American. British testers are 
advised to copy into their handbooks the revisions suggested by 
the Scottish Council for Research in Education (Kennedy Fraser, 
1945).2,As this booklet shows, the order of difficulty of many of 


e items is often inappropriate for British children (and the same 


is true of the words in the vocabulary list). In consequence, the 


‘scatter’ of any child’s performance—from his basal age up to 
age of complete failure—is apt to be rather large, often as muc! A 
6 years, Note that this scatter does not, as is sometimes supposed, 
give any indication of the testee’s emotional instability. A 
For testees of 16 and over M.A.s are divided by 15 to give the 
LQ. But the gradual deceleration of mental growth between p 
and 16 is allowed for by using 13:2 35 divisor for aa age 
13:3 13:4 for those aged 13:6 and so on up to 15:0 for those 
aged 16:0. This device is incorporated in the C.A.-M.A.-LQ. 


tables at the end of the book. 
Se ee ee lee by Burt, but the only part of 


2 Another English revision has been prepared 
this published is his fie order of test items (Burt, 19554). 


the tester requires the handbook, 
f printed cards, record forms, 
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The main types of items in Form L are shown in Table I, to- 
gether with the numbers of ti 


mes they occur within certain age 

a 
Baal be seen how largely verbal is the scale, particularly at a 
top end. Though toys and pictures are frequently used, especial y 
with younger children, the responses to these as well as the in- 
structions mostly involve the use and understanding of language. 


TABLE I 


CLASSIFICATION OF TERMAN-MERRILL ITEMS AT 
VARIOUS AGE LEVELS 


Ages I 
24} 5-9°10-14 Ad. | Tota 
Practical ingenuity with formboards, blocks 6 3 1 i 
Following simple instructions | 3 
Spatial: copying, drawing, reproducing de- 
si AE OAN AET ARAE Te Vai Gs AA 
Identifying and naming objects or pictures | 14 


Vocabulary; defining concrete or abstract 
words; word fluency; differences ke- 
tween words , 5 5 s 6 

Word-relations: analogies, rhymes, similari- 
ties, sentence completion | A 

Pictorial relations A S 4 f 

Comprehension: seeing absurdities, inter- 
preting stories or proverbs f 


Pit ae gta 

Pictorial comprehension 1 2 3 + 6 

Reasoning problems cfm te 2 3 5 

Counting and number problems | ‘ 3 2 3 8 
Immediate memory for digits, words or 

sentences x > J $ 4 5 5 4} 18 

Memory for objects or pictures 2 12 

ee AA eN] 


4&2 31 30 26 | 129 


Thus the scal 
those with lin, 
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artificial and childish for adults. Hence, even apart from the 
doubtful standardisation of the Terman-Merrill at the top end, 
a testers prefer the Wechsler scales for older children and 
adults. 

Our Table also brings out the heterogeneous content of the 
scale at different levels. Although the authors have shown that 
évery item correlates with the test as a whole, and though factor 
analysis yields a very large general factor (presumably g + v), 
numerous minor factors—number, space, rote memory, etc., enter 
to varying degrees at different ages (cf. McNemar, 1942). At the 
same time there are insufficient of these more specialised items to 
justify any differential diagnosis: for example, one cannot safely 
judge that a child is better on the practical than the verbal side, or 
strong in rote memory, and so forth. But though the only reliable 
outcome of the scale is a single I.Q., the individual testing, situa- 
tion, the diversity of items, and the fact that all responses are 
creative, do provide valuable opportunitics for qualitative ob- 
servation of how the child’s mind is working and how he reacts 
to difficult or frustrating tasks. Thus the experienced tester cen 
learn much of value about the child’s personality and re y 
though he should also be very wary of the reliability a 
‘intuitions’, and realise that the child’s reactions may not always 
be typical of his behaviour and thinking outside the testing 
ThE full dardised on representative 

e scale was very cate! standar 

therican ATE, at each bee but nevertheless seems ne 

€cidedly inaccurate in this country from about 11 years onwards. 


British testers constantly find abnormally bee ae 
adolescents! The present writer obtains mor A EAEE 


Counting each year of M.A. above 7:0 as 
ng to a ob and Mellone (1952), these E 
arise, not because the scale is too easy all round but be o 
Variations in the Standard Deviation of I.Q.s at E ey 
cf. p. 105). These authors publish a useful table o ee pe 

ased on their own findings and on the variations a mi e o 
Terman and McNemar. Testers would do well to use these, = 
to ensure that the original as well as the corrected I.Q. is quoted. 


*In the representati i Je tested at 11 yearsthere were 3 per cent 
tative Scottish sample tesi 5 e Te 
of LQ.s ine os ae To per cent over 130 (Scottish Council for Resear 


Ucation, 1949), 
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is li i an- ill will be published 
likely that a new version of Terman. Merrill will 1 
chery aa the best items from Forms L and M, which 
will also attempt to overcome these irregularities, 


, differing factorial conien 
at different age levels, unsatisfactory scoring in terms of Menta 


Ages and L.Q.s, and lack of restandardisation for British children. 


Yet it continues to serve admirably most of the practical purposes 
of child and educational guidance, 


THE WECHSLER ADULT 


INTELLIGENCE SCALES 
The Wechsler-Belleyue scales are the accepted instruments for 
individual testin 


of adults, as is the Terman-Merrill for children, 
and they are similarly admi istered, However, they show many 
important differences. 


(a) ey are point, not age, scales. Both Forms I and II and 
WAIS (Wechsler Adult Intelligence Scale) contain rz sub-tests, 
each covering the whole range of abili 


scores from 0 to 17 (0-19 in WAT ). The testee’s total score is then 
converted directly i ie 


to an LQ., without brin in Mental Ages. 
Thus every testee 5 inning of a ibat and goes 
as far as he can. While the burden both 


o hildren, fter correction 
for obvious sources of error) for children fee ung children, or (a 
normal, or for those adults 
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ance ones in particular do not provide a very representative 
sampling of non-verbal thinking. 

(c) The test material is chosen to interest adults rather than 
children. 

(d) Scores on the 5 (or 6) verbal and 5 non-verbal or perform- 
ance sub-tests are also totalled separately to yield verbal and 
performance 1.Q.s. While the verbal I.Q. alone gives a better 
indication of scholastic aptitude, Wechsler regards the Full-Scale 

Q. as providing a fairer picture of all-round intelligence in 
everyday life, and therefore as more appropriate for the assessment 
of mental defect or for other clinical purposes (there is some 
experimental confirmation of this claim). 

(e) Both Verbal, Performance and Full-Scale I.Q.s are arranged 
to have the same Standard Deviation, close to 15, at all ages. 
However, the range of items is more restricted so that the tests 

© not discriminate quite as effectively as docs the Terman- 
Merrill either within the very bright or within the mentally 


elective categories. 
l Separate norms are provided at each of the following age 
evels: 10, rod, roh, .. . 14}, 14}, 15, 16, 17-19, 20-24, 25-29, 
30-34, . . . 55-59 years. A marked decline with age above 24 
Occurs on several of the sub-tests and on the total scores. But as the 
Average IQ. is fixed at roo at all ages, this means that older per- 
pons are accorded higher I.Q.s than they are by other tests 
swch as Terman-Merrill or group tests) which make no age 
2 ‘Owance,2 
piae Wever, if desired, older adults can be assessed Gea 
qEPsst (20-24 year) norms, and the resulting quotients, e , 

cline with age, are termed Efficiency Quotients instead O; 


telligence Quotients. For instance, the average 55-59 year total 
-year olds, and is assigned an 


Sco 

a is much the same as that of 113 

ency Quotient of 83. A 

(3) One or two of the sub-tests can be omitted if they appear 

1 

The w. ived in much the same way as 
tha VAIS and WISC I.Q. norms were derive b 
3 described on p. 50, and therefore yield some 2 per Sat a i. ae 


O and below s stan 
eB chsler-Bellevue was stan aito" é 
T andabout sats iling a decidedly skewed LQ. distribution, with only 


T per ce 
vee ve fecbleminded patients, the 
than the Wechsler at age 


Using to 20 or more points at age 50-60- į 
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inappropriate, or if time is short, but Wechsler recommends that 
not less than 8 should be given. The LQ. norms are based on 5 
verbal and 5 performance tests; with more or less than this the 
totals are corrected by an appropriate figure. The sub-tests can be 
given in any order that the tester likes, 

As the Full Scale generally takes about an hour, many other 
psychologists have tried out abbreviated versions. The most 
popular of these is Vocabulary, Information, Similarities and 
Block Design. British Army and Navy psychologists during the 
war often used Comprehension, Vocabulary and Block Design. 


The results of such short scales correlate highly with the Full 
Scale, but are naturally less reliable, and they are not properly 
standardised. 


x . 
i (h) When the results on the rz sub-tests are expressed as equiva- 
ent or ‘wei i 


: es, Organic psychoses, etc. Probably the 
skilled clinical psychologist 2 sea such patterns, together 
tures of particular responses, and thus make a 
t n to the differential diagnosis of mental 
patients (cf. Patterson, 1953). But numerous investigations have 
shown that the Statistical reliabili 


isms have been adopted, but the Briti i Saye 
now prepared a standard list, Pai e eey 
1. General information: 25 que 


sti ryday rather 
hae peels fel oan covering everyday 
1 These are published by the P, 


_ 1 Thes c sychological Cor oration of New York and, 
in Britain, by the National Pounders for BAA Research. 


OE S re 
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2. Comprehension: 10 questions involving commonsense or 
practical judgment; scored 2, 1 for weak answers, or o. 

3. Arithmetic problems: 10 problems to be done mentally in 
15 seconds to 2 minutes each. 

4. Repeating digits forwards and backwards: 14 series, ranging 
from 2 to 9 digits, two chances at each series. 

5. Similarities: 12 questions—in what way are so-and-so alike? 
Scored 2, 1 or o for quality of generalisation. 

6. Vocabulary: 42 words; scored 1, $ or o. 

Lists of acceptable and failing responses to most of these tests 


are provided. A : 
7. Picture completion: 15 pictures in each of which some 


missing part has to be named; 15 seconds each. ý 

8. Picture arrangement: 6 series consisting of 3 to 6 pictures; 
each series has to be arranged in the right order to tell a story, in 
given time-limits, and bonus marks are given for more rapid 
performance. y y 

9. Object assembly: a set of 3 simple jigsaws representing a 
manikin, a profile and a hand; scored for pieces correct and time. 

10. Block design: 9 red and white designs similar to those 
originally constructed by Kohs (1923), to be copied with 4, 9 or 


16 coloured blocks; scored by number correct in given times, with 


credit for fast performance. Wi en 
11. Digit-symbol substitution: a key shows 9 simple sym ols 
paired with the numbers 1-9. Below is a mixed list of numbers, 
and the testee writes the corresponding symbols as quickly as 
Possible; 14 minutes. ce 
The WAIS test is more recent and is in many respects tech- 
nically superior to Forms I and II. Several of the sub-tests are 
longer than has been indicated above. It provides norms for ae; 
groups 16-17, 18-19, 20-24, 25-345 35-44, 45-54, pase an 
these have been extended (though on less thorough samp es) to 


Over 75 years. i 
Repetition of Form I within about a month gives retest rises of 


about 4 poin al) and 9 points (performance). The retest reli- 
EE NE a Ke va and Full Scales are similar to that 
of Terman-Merrill, that of the Performance Scale rather lower. A 
typical correlation between Verbal and Performance 1.Q:s is 0°71, 
and the median discrepancy 9 points of LQ.; but occasional 
testees may differ up to 30 or 40 points in their two I.Q.s. 


+ _ 
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The Verbal Scale probably. gives no better indication of sins 
tional capacities among normal adolescents and adults pee z 
thorough group test, but has the advantage of better contro Re 
testing conditions in abnormal cases. Performance I.Q.s may x 
slightly more relevant than Verbal to daily life and manaa 

, occupations, but should not be interpreted as showing aonn 
practical ability without further evidence. The group nen : 
measured by the various sub-tests are highly complex, but 


Full Scale may be accepted as a good g test, the Verbal as mainly 
at. 


WISC: WECHSLER INTELLIGENCE SCALE FOR CHILDREN 
This is a downward extension of the 


previous scales, providing 
Verbal, Performance and Full- 


Scale I.Q.s over §:0-1§:11 years. 
Equivalent ‘scaled’ scores for each sub-test are listed at four-month 
intervals throughout. Most of the material is identical with that 
in Form II, the main differences being: 


(a) Digit span is either omitted from the Verbal Scale, or used as 
an alternative. 


(b) An alternative to the Di 
series of Mazes (similar to th 
scored for time and errors, F 

(c) No Efficiency Quotients are needed, as the scale is not 
intended for adults, 


The testis probably better standardised than any other individual 
scale in America, and th 


ere is no reason to suppose that this 
standardisation would be scriously at fault in Britain. The I.Q.s 
are closely comparable to those of Stanford-Binet and Moray House 
tests, but are much more re: 


stricted at the top and bottom ends than 
those given by Terman-Merril] Thescalej 


git-symbol performance test is a 
ose of Porteus, p. 69). These are 


or a 
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C. W. VALENTINE S INTELLIGENCE TESTS FOR 
YOUNG CHILDREN , 


This scale is designed specifically to help junior, infant and 
nursery school teachers to do their own testing. The component 
items are mostly very straightforward, and require no apparatus 
beyond the pictures given in the Manual (Valentine, 1948), or 
simple objects that can be prepared at home. Its cheapness is a 
considerable recommendation. How far teachers can be trusted to 
give it properly, unless they have had some training, is more 
doubtful. Directions for giving and scoring should be adequate, 
though they are less full than for Terman-Merrill. 

The items are mostly assembled from other tests—Terman- 
Merrill, Gesell Schedules, Porteus‘Mazes, Burt’s Reasoning Tests, 
etc., with a few additions. Either 8 or 10 items are provided at 
each age level: 14, 2, 23, 3, 3h, 4, 5, 6, 7 83 6 for each year from 
9 to 13 and 5 for years 14 and 15. Thus the test is suitable for 
children of ages 2 to II or I2. Though there are more items for 
younger children than in Terman-Merrill, it usually takes no 


Taste II 


CLASSIFICATION OF ITEMS IN VALENTINE’S 
INTELLIGENCE TEST 


Ages 

1b4 5-9 10-15 
Mot ; ‘bhli: : s 7 0 0 
or development, walking, scribbling, etc. ah j 5) 
Tactical ingenuity and manipulation : : 3 6 7 
Poga and drawing (including mazes) 5 1 0 
ollowing simple commands . 10 3 1 
dea eulary and word fluency >. - see Over. 
cntifying objects and pictures by name 6 7 5 
Vord relations A z d 5 0 0 
icture relations A 1 2 4 
‘©mprehension—verbal . 0 4 0 
©mprehension—pictorial 0 Be ul 
€asonin : i ? ; ‘i 
Umerical 4 ; : ` 4 8 4 

mmediate memory, words and digits PSAE, 
52 E 40 | 34 


4 
# 
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longer to give since they are arranged in better order of difficulty. 
Nevertheless, the author recommends that testing be spread out 
over two or more sessions if there are signs of fatigue. Two or 
three items at cach level are starred, but these are not to be used 
as an abbreviated scale; instead they are given first in order to 
indicate the range over which all items must be applied. With 
» different numbers of items at different levels the scoring of 


Mental Age is a little complicated, but follows the same plan as in 
Terman-Merrill. 


The items a 
children, but a 
Merrill or WISC fi 


ia 3 gree of superiority or inferiority through- 


INFANT TESTS: GRIFFITHS’ S MENTAL DEVELOPMENT 
SCALE 


There are several American 

c l pre-school .seales, such as the 
Merrill-Palmer and ene based on play 
ularly attractive to 2- to s-year olds (cf. 


But they are not described here because of the 


variable from Pete are highly distractible, easily fatigued, 


e .¢Ss, psychologists such as Gesell, Valentine, 
Bayley and Griffiths tightly point out that the young child’s 
motor, sensory, language and social reactions tend to develop in 
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a very regular order, and do so at different rates in different 
individuals. These functions are basic to the later development of 
intellectual abilities, and even ifa single testing has little diagnostic 
value, it is often revealing to trace a child's progress throughout 
the pre-school period. Moreover, it is of great importance to be 
able, at an early age, to recognise serious retardation in, say, co- 
ordination or hearing, which may indicate pathological condi- 
tions. 

The best known scale for infants—Gesell’s Developmental 
Schedules—makes no claim to measure intelligence or even to 
give an all-round growth quotient (Gesell and Amiatruda, 1947)- 
These schedules provide norms or standards of development for 
4, 18, 28 and 40 weeks and 12, 18, 24 and 36 months in four main 
areas: bodily co-ordination, eye-hand co-ordination, speech, and 
personal-social behaviour. Bühler and Hetzer’s (1935) more 
elaborate series of tests are scored to give a developmental quo- 
tient, and P. Cattell’s set of infant tests (1940) is designed to con- 
nect up with the bottom end of the Terman-Merrill scale. 
Griffiths’s scale (1954), which we shall describe here, is not only 
the most accessible in Britain, but is also probably the most 
comprehensive and best standardised of any. It contains 3 items 
per week for the first year of life, and 2 for the second year, that 
is, 260 in all. They are grouped under five headings, scores on 
which are practically equivalent throughout, so that the scale 
yields a profile of development under each heading as well as an 
overall result called the G.Q. or General Quotient. The headings, 
together with specimen items for babies a few weeks old and for 
children approaching 2 years, are as follows: 


Soon after birth Approaching 2 years 

A. Locomotor Kicking, head- Walking downstairs 
lifting unaided i 

B. Personal-social Smiling, recognition Asking for things 
of mother at table 

C. Hearing-speech Startled by noises Uses sentences of 4+ 

syllables 

D. Eyeandhand Following moving Throwing ball into 

lights basket 
Assembling parts ofa 


E. Performance Grasping objects 
; toy 
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Other items are based on standard toy apparatus,1 and several of 
them are filled in from general observation or from informan é 
given by the mother. The time taken for testing is usually un i 
30 minutes. As in the Binet, tests are given from a point cat a 
scale where there are 6 successive passes up to a point where the F 
are 6 successive fails. But the scores are expressed in weeks, an 


divided by weeks of age to yield quotients. The Standard Devia- 
tion of G.Q.s is close to 12 at 


for the five sub-scales are not eae ao 
ower. Their overlapping is probably hig h 
gnosis from the profile must be made w} A 
great caution. There is no information regarding future predictiv ? 
value, but from the results of American scales we would expec 
virtually no correlation between a single testing in the first Via 
andintelligence at 5 years or later, Finally, we should re-emphasis 

that the application of tests at this level necessitates exceptional ski 


A 8 in giving and scoring the items- 


locks, or drawings are natural] F ; Thus they 
fl; aie y attractive to children. ps 
ar useful in establishing good rapport at the beginning of a F 
eey oral—Terman-Merril Session, even if their results are a 
poet diagnostic value. And they provide excellent opportunities 
en rig pga Petamental reactions to difficulties. But hey F 
y unwie. nit 
Specific abilities. nd very costly. And they are so depende 


y chance factors of unreliability» 
ble number must be 


: - Thus a child with lan, handi and many 
elin A $ -anguage handicaps, ar nA 
perforta vt ae schooling, tend to score higher 


250p plied by the author. 
ter work by other testers has yielded far less favourable figures. 
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and some nervous ones who prefer academic to practical activities, 
do relatively poorly. Another weakness in performance tests is the 
large practice effect; scores tend to rise markedly on retesting 
even after two years. 

When a miscellaneous set of tests is given, as in the original 
Pintner-Paterson scale, each test has its own M.A. norms, and the 
median of these Mental Ages is usually taken as the final score. 
The trouble with this is that several of the component tests may. 
be poorly standardised. More accurate norms are generally avail- 
able for such point scales as Wechslet’s and Arthur’s (another 
American series), and Collins and Drever’s. Here, so many points 
are awarded for varying degrees of success on each test, and the 
total points are converted to an M.A. or LQ. by a single set of 
norms. This means that the whole battery, lasting perhaps a full 
hour, must always be given; the other system allows more 
flexibility. 

The Collins-Drever scale was originally devised for testing deaf 
children (Drever and Collins, 1936).1 It was found that they score 
as highly as hearing children on this type of test, though about 
3 M.A. years inferior to the average on verbal tests. The scale can 
be given with oral or with pantomime directions. 

Scale A contains 11 sub-tests: 

1. Kohs Block Design: from 4 to 16 coloured blocks are pro- 
vided which have to be arranged so as to reproduce a series of 10 
patterns; 2 to 4 points are given for each pattern completed in a 
Set time. 

2. Knox Cubes: reproduction of patterns tapped out on 4 
cubes. These range from 1, 2, 3, 4 to I, 3, 4, 2» I 2 

3. Dominoes: another memory span test; picking out up to 6 
dominoes in correct order from a set numbered ONO 10,9) 

4. Arranging 5 cubes by size and 5 brass weights by weight in 
correct order. pt, ated eA 

5. Manikin puzzle: pieces correct in time limit. 

6. Feature profile: ditto. 2 ; 

7. T nieme Formboard: 9 rectangular or triangular pieces 
to be fitted into 2 frames; scored by time. p 

8. Healy Puzzle A: 5 rectangular pieces to be fitted in a frame. 

9. Cube Construction: 3 large wooden blocks painted on cer- 
tain sides; 8 or 9 small cubes to be put together to reproduce these 

i Material supplied by Baird Scientific Instrument Co., Edinburgh. 


LT.—5 
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models; scored by number of cubes correctly assembled in 5 
i h. . 
Raa EA a picture of Little Bopeep from which 12 
AE ie been cut to be reinserted; scored by number in 
ay nee Picture Completion I: a large Picture with 10 say 
holes, to be filled by cieo ing among $0 square pieces; scored by 
r and aptness of choices in 5 minutes, 4 
D total scores and quartiles are listed for boys and ate 
aged 5} to 154, girls being 1 to 2 years behind boys throug pu 
If scores are thus expressed as M.A-s and then divided by C.A. a 
the usual way, the resulting I.Q.s are not comparable with Bin 
or Wechsler I.Q.s, having a Standard Deviation of about ae 
Hence a separate table of deviation 1.Q. norms is provided, as i 
entative M.A. norms are given also a 
isting of Kohs, Cube Construction an 
Healy Picture Completion I. 1.Q.s obtained from this will cer- 
y wide spread. 
f 7 tests, is recommended for younger 


-7 year M.A. norms are given. 
1. Sizes and weights as in Scale A. 


3. Knox Cubes: a simpler series of patterns. may 
4. Seguin-Goddard Formboard (cf. p. 70); scored by time fo 
correct completion. 


5. Dearborn Triangle Formboard: number of pieces filled in 
5 minutes, 


6. Mare and Foal: a picture test with Missing pieces to be in- 
serted in given time, 


7- Teacher and Class: ditto. 


a considerable practical or spatia’ 
ability factor in addition to eir g-content. Thus they are re 
garded as particularly suitable 


tion. Alexander (1935) states that they measure “capacity to 
in a concrete situation”.1 4 d 
Kohs Blocks and Cube Construction use the same material an 
1 Materials and Manual supplied by Nelsons, 
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are given in the same manner as in the Collins-Drever scale. But 
in order to improve their discrimination a more elaborate scor- 
ing system is introduced, allotting so many points for success at 
the various models and for time taken. The third test is a new one, 
Passalong, consisting of nine boxes each containing one red and 
several blue squares or rectangles. By sliding the pieces around, 
the red block has to be moved from the bottom to the top posi- 
tion. The problems are steeply graded in difficulty, and are 
scored by time for each success. The three tests can be given in 
about 40 minutes. 

The total points are translated into Mental Ages, with separate 
norms for boys and girls from 7 to 19. There is no appreciable in- 
crease in score beyond 15, and higher M.A.s are arbitrary. The 
resulting I.Q. is called P.A.R. (Practical Ability Ratio). It is said 
to be comparable to a Moray House or Terman-Merrill LQ., so 
that a 10-points difference indicates relative suitability for technical 
or academic education. But actually the Standard Deviation of 
P.A.R.s seems to be considerably higher—over 20 at 11 years— 
and the norms are probably too lenient; hence it is far easier to 
get high P.A.R.s than verbal I.Q.s. Watts and Slater (1950) sug- 
gest therefore that local percentile or standard-score norms should 
be collected if the scale is used widely for allocation purposes. In 

e writer’s view, paper-and-pencil spatial tests are preferable in 
me a situation. They P E the sime abi age 
reliably, are more readily applied to large numbers, an 
likely i be upset by pes practice or by the handing on of 
information from boys who are tested early to those tested hrt 
Alternatively, a complete revision of Kohs Blocks and Cube 
Construction for group application, with smaller blocks and new 
designs, has been constructed by Jones and Hey (1952). Norms are 


supplied for boys around 11 years. 


SETS OF SEPARATE PERFORMANCE TESTS ' 

None of the above scales meets the needs of testers who simply 

iri i ical tasks; 

(a) to be able to observe the testee working at bama as ; 

” to have sufficient tests to po a fairly reliable contras 
With the strongly verbal Terman-Merril; f 

(c) to keep fer time spent on such tests fairly short, and if need 


€, Omit one or two. 
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The present writer would suggest the following sets, each test 
being scored separately, and the median or mean M.A, taken. 
They are not suitable for calculating performance I.Q:s. 


A. For children with probable M.A. 10 upwards and adults: 
1. Kohs Block (Alexander or Trist-Misselbrook). 
2a. Porteus Mazes. 2b. Mazes time score. 
3. Oakley or Moorrees Formboard. 
4. (If time allows) Cube Construction, 
> 
B. For children with probable M.A. 4 to 9 years. 
1. Goddard Formboard 
2. Porteus Mazes 
3. Goodenough Draw-a-Man. 
4. Mare and Foal and/or 
5. WISC Kohs Blocks, 


Kohs Blocks. No Separate age or sex norms are published ex- 
cept for the original—very length 


Quite a different version for adult recruits was developed by 
Trist and Misselbrook during World War II, and is described by 
Semeonoff and Trist (t958). 

Cube Construction. This is probably a better test than Passalong. 
Owing to the lack of norms for other Versions, it may be given 
cogs to nae moenie (t925). However, if Alexander’s 

rocedure is preferred, arg 
eine some very rough norms for boys 
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PORTEUS MAZES 


This well-known test of ‘foresight and planning capacity’ dates 
back to r9r4. It consists of a graded series of mazes through which 
the child has to draw a path without divagations. There is one for 
each year-level from 3 to 12, 1 for 14 years and 1 adult. Thus the 
score in M.A. years consists simply of the level of the most 
difficult maze accomplished, though this is complicated by allow- 
ances for errors which are corrected at second or further attempts. 
Neither the norms nor the reliability of the test are very satis- 
factory (Tizard, 1951). But Porteus presents considerable evidence 
for the claim that it corresponds better than most verbal tests with 
adaptability to the social and practical requirements of everyday 
life. Scores appear also to be more susceptible to brain injury and 

eucotomy operations. 


A slightly revised series of mazes was issued in 1933, and the 


older series (given, for example, in Burt’s Handbook of Mental 
Tests) should not be used. Porteus requires a fresh set of blanks to 
be used by each testee at each trial (i.e. 2 to 4 copies of any maze at 
which errors occur). To save expense we suggest using Valentine’s 


ked, but traced by a dry 


Version, where the one set is not mar: 
Paintbrush or pen-point.? In this case the tester must be most 
careful not to allow self-corrections; and the fall instructions in 
Porteus’s Manual (1952) should be consulted, particularly in re- 

gard to scoring the 12- and 14-year mazes. ; 

The writer has shown that it is worth recording secretly the 
aggregate time spent on Mazes XI, XII and XIV among testees 
ely to RAA all three (Vernon, 1937) Ree 
norms are tabulated below. Though this speed. score corre m 
moderately with Binet M.A., it is independent of Maze score an 
elps to diferentiate the cautious from the impulsive testee. 
Orteus also presents a detailed method of scoring carelessness 3 
errors which yields a Qualitative or Q-score. This is claimed to 
aa higher (i.e. worse) among delinquent adolescents and 
a lults than among normals. Though a promusing pate ay 
to make adequate allowance for the level reached on the 


1 = * 

ane (1940) does not include the Adult maze by which the maximum 
eae M.A. years. 

*Ifan error eae ee cag and start again when the testee reaches the 


“ME position at the next trial. 
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OEN s 
Fic. 1 Seguin Form Board: Positions of Pieces in Three Heap: 
Ready for Re-insertion, 


son of its 
test, and there is as yet little independent confirmation 
validity (cf. Gibbens, 1958). 


FORMBOARDS 
For young children th 
remarkably effective tes 


rity- 
as might be Supposed—on mere manual dexte: 
Yet it is very simple, bej 


aped wooden pi 


- We recommend the large bo 
with loosely fitting insets, shown in Fig. 1.1 Three trials ae 3 
lowed, and the M.A. can be determined from the averagi 
total-time and best-trial norms, as listed in Table MI. 
1 Made by Baird Sci 


entific Instrument Co., Edinburgh. 
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Fic. 2 Oakley Form Board. 


con a more difficult level there are several choices, though un- 
Ke unately none is at present manufactured in this country. The 
ue as or Worcester Formboard (clinical model) can be 
ne from America (cf. Buros, 1953). The Moorrees Form- 
a as been described by the writer elsewhere (Vernon, 1937)» 
Ree provided. The Oakley Formboard, shown in Fig. 2, 
suffi ves arranging 2 to 6 coloured pieces in each of 7 holes. It is 
ee ciently complex to tax the highest levels of practical ability, 

provides excellent opportunities for observing the testee’s 


methods of work (cf. Oakley, 1935)- 


GOODENOUGH DRAW-A-MAN TEST 

sy test to give, the child simply being 
man that he can. The 
nter into the scoring; 


nied Is S particularly ea t 
aea E draw the best picture of a 
rather quality of his product does not € t 
ET a good score depends on accurate observation and the 
pee Beat of concepts of the human figure and clothing. The 
ead e or absence of 51 specific points is noted, ranging from: 
present, legs present, to sleeves and trousers non-transparent, 


©Pposition of thumb shown, etc. 
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Chapter Five 
GROUP INTELLIGENCE TESTS 


Group tests have obvious advantages where large numbers of 
children or adults are to be tested, namely economy of time and 
the fact that they can be given and scored by teachers or other 
persons without special training. Indeed, they are perhaps too easy 
to get hold of, and the need for skilful handling of the testes and 
for training in interpretation of results is insufficiently realised. 
Among older school pupils and average or above-average 
adults they are often more reliable and at least as valid as individual 
tests, since they can readily be constructed to suit the requisite 
Tange of ability; whereas individual tests are most needed at the 
bottom end of the scale and cannot simultaneously cover the 
upper ranges as thoroughly. The group test, however, implicitly 
assumes that all testees are in the right frame of mind, that they 
co-operate and work keenly and quickly, follow the tester's 
Oral instructions, and be capable of reading and understanding any 
Bed instructions, But these assumptions cannot always be 
ulfilled, and the great advantage of the individual test is that the 
tester can control such disturbing factors as fatigue, poor eyesight, 
Poor reading ability, anxiety or undue caution, distractibility and 
adequate motivation. If he cannot dispel these he can at least 
is serve their effects on test performance, which the group tester 
annot. For such reasons individual tests are absolutely essential 


a Pre-school children, and very desirable for 5—7-year olds. The 
Proportion of unstable children who fail to settle down to the 
though even at 11 there 


E onp test situation decreases with age, the 
€ probably 1 or 2 per cent who do not do themselves justice 
Connor, 1952). All suspected defectives, psychotics, brain- 
ind aged patients and certain types of neurotics similarly require 
vidual handling if their results are to be P GT MRA, 


are € same holds good for educational tests. After af, c 
not usually expected to cope with formal examinations till 9, 


i . 

in, or later, Class exercises are often introduced earlier, but noth- 

fea hangs on them, and the teacher can generally observe if 
€ are misunderstandings or disturbing influences. 
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At the same time we should point out that the good reliability 
and validity of most group tests ensures that their results cannot be 
so seriously affected by the testees’ feelings and motivation, or by 
the manner in which the test is given, as some critics suppose. 
‘War-time studies of the effects of state of health, and menstruation 
in women, on test performance yielded entirely negative results 
(cf. Vernon and Parry, 1949). And some investigations of in- 
centives suggest that when children are made extra-keen by the 
offering of monetary rewards, they attempt more items but also 
get more wrong, so that their actual scores are little altered. 
Nevertheless, we agree with Heim (1954) that more research is 
needed into the effects of the attitudes of the testes, and that there 
is a good case for seeing that testers are adequately trained to instil 
the proper atmosphere, as well as to follow the instructions. i 
The reading capacity of the testees requires consideration in 
almost all group tests. Thus it has been demonstrated that the 
typical intelligence test for 11-year olds requires a reading age of 
over 9 years for the understanding of instructions and verbal 
items. This means that, for the bottom quarter of the age group, 
such tests are effectively little more than reading tests. Many 
depend almost as much on 


group factors other than verbal ability. 
Other reasons why mos 


t i ly on an 
individual test whey ee Psychologists prefer to rely 


Portant decisions have to be taken are the 


are composed, and the fact tha 
speed. This latter factor (which 
should not be exaggerated, s 
ably constructed grou 
as capacity for working at 5 
who range widely in age sh 
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VARIETIES OF TESTS AND ITEMS 


In addition to the classification—oral, printed verbal, non- 
verbal or mixed—tests may belong to the battery or the omnibus 
types. Both normally include half a dozen to a dozen kinds of 
items, cach ranging from easy to difficult. But in the battery test 
all the items of any one kind are put together in a sub-test with its 
own instructions and practice samples, and with its own time 
limit (which may be anything from 3 to 20 minutes). In the om- 
nibus test, however, all kinds of items are mixed up; or there are 
short sets of each kind, first at an easy, then at more difficult levels. 
There is a single time limit for the whole test (usually 45 minutes, 
though ranging from 30 to 90), which means that it is more easily 
applied by an untrained tester, without a stop-watch. Inthis form, 
it is more difficult to provide adequate instructions and explana- 
tions. Sometimes a preliminary practice sheet is given with 
examples of each kind of item, but usually each item-type has to 
be explained and illustrated each time it recurs; as no oral help can 

e given, comprehension of and attention to the printed me 
tions become a major part of the test. Convenience of ad- 
ministration also means that the testees have to work without a 
break. Possibly the short spells at the separate sub-tests in a bat- 
tery induce better concentration. Nevertheless, no one has ever 
Proved that omnibus tests are any less valid, and it should always 
be remembered that, in either form, every set of items with ie 
instructions has been tried out beforehand and found to work, T e 
same answer can be given to armchair critics who selene Ge 
Certain items (usually taken out of context) are far too difficult, 
and that children should not be apie: to do 100 of them in 45 


Minutes. It is, of course, only the oldest and behi aes 
Whom the test is intended who finish, or ee ak rene rae 


children are not expected to do 50 or sO items, 


‘2 do so, hat 
We have admitted that the items commonly used are paa a 
artificial, Though there are innumerable minor varia 


i i et the 
rm, and the precise content differs in every new test, y 


a Controlled experiment by the present writer showed that ae wae ue 

“tence in the coach-ability or susceptibility to practice dh aa higher in the 

om ests: also that the average scores were if anything, slighty ™S 
mnibus version. 


' 
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e of thinking processes called upon is remarkably limited. It 
aad seem as etre the early psychometrists—Ebbinghaus, 
Spearman, Terman, Otis, Thomson and Burt—chosea few es 
quite arbitrarily which were easy to construct and administer an 
score objectively; and that few later authors have been able to 
escape from their conventions, Almost inevitably such items are 
limited to ‘closed systems’ of thinking (p. 39), predetermined by 
the test constructor, and thus may fail to include the rich varieties 
of thought that children and adults display in everyday life. 
Nevertheless if new and more natural items could be invented, it 
would very likely be found that they would measure the same g 
(or R) and V factors as the present ones, Moreover, Guilford and 
his colleagues are now greatly expanding the range, and attempt- 

tional group factors are involved. 


differences in abilities at different materials may even outweigh in 


Directions Items (oral or printed) 


Ex. 12. If the moon is larger than the sun, write the letter K; if not, 
write R, 


Alternatively, testees may be directed to mark particular figures 
Or pictures on their answer sheets. More difficult items become 
more complex, thus invoking a considerable effort of compre- 
hension; or sometimes they bring in spatial orientation. 


Ex. 13. Draw 6 circles, 2 large and 4 small ones, none of them touch- 


ing. Two of the small circles should be inside one of the large ones, the 
others outside, 


Vocabulary and Opposites 


Ex. 14. Discord means the Opposite of (contented, laughter, 
encouragement, harmony, comfort)  * : 
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15. Underline the word which means SHUN 
most nearly the same as the word hate stand 
in capital letters avoid destroy 


examine exclude 
16. What is it that flies through the air, and can sing? 
(aeroplane, butterfly, music, nightingale) 


Note that the wrong alternatives are chosen as being fairly 
plausible errors. Another variant is the synonym-antonym test - 
(Ex. 11). Yet another takes the following form: 


Ex. 17. Underline two of the following words which mean either 
the same as or the opposite of each other. 
(quick perfunctory tardy metallic smelling meticulous upright) 


General Information 

Ex.18. The tuna is a kind of: musical instrument, fish, bird, 
Weapon, 
Provided such items are drawn from a wide field rather than 
dependent on specialised knowledge, this test correlates very 
highly with other verbal intelligence tests. (Indeed, there is ob- 
Viously little to distinguish Ex. 18 from Ex. 16.) However, in- 
formational items show a considerable sex difference in favour of 
males, whereas more purely linguistic ones tend to favour females. 
A variant is sometimes called the: 


Always Has Test 
EX. 19. A shop always has: 
(salesmen, books, a fireplace, a door) 
The Same type of item can be expressed in terms of pictures, 


81Ven with oral directions. 
€ next two species might be termed Word (or N on-verbal) 
Relationships, 


Classification 


Roe 20. Underline the one word (figure, etc.) which does not belong 
ith the test—or which is the ‘odd man out - 


Table Crate Fireplace Mallet Door 
Ex. 21.. DCX GHV LKT JIZ FER 
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Ex: 22. 


Pee | 


Similar tests are constructed of pictures of familiar objects. ae 
capacity to generalise and abstract the common feature always a, 
a high g-saturation. Nevertheless, it is difficult to make this ai in 
form watertight. Thus in Ex. 20 it might be argued that a 
(the only tool) is as good an answer as Fireplace (the only one n 


made of wood). The following is less ambiguous, but also more 
complex to ‘get across’, 


Ex. 23. The three words 


line two of the other words which also belong. 


IVORY SNOW MILK (water cottonwool butter cold flout) 


Analogies. These have probably been more widely used than 
any other type. The ordinary verbal form is illustrated in Ex. 7- 
Pictorial or figural on 


Prea f a 
es can likewise be devised at an easy Or 
difficult level, 


in capitals are alike in some way. Under- 


Ex. 24. 


A complex variant is: 
1 i 
i 
' 


Ei SS ee 


Ex. 25. Underline two of the 


: A ted 
A words in brackets which are connec! 
in the same way as th 


e words in capitals. 
HOUR is to MINUTE as: ( 

Analogies can also be expressed in matrix-form; note that the 

following overlap with our later category of Induction tests. 


Ex. 26. Fill in the missing number 


day second clock minute time week) 


2 5 8 
4 10 16 
8 20 — 
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Ex. 27. Underline one of the four figures which fills the gap. 


F 


= 


UX 
ARSS 


Next we have several species which might be generally termed 
comprehension of sentences, pictures or shapes. Ebbinghaus’s 
Sentence Completion (Ex. 1) was the first of these. For objective 


marking it is almost essential to turn it into multiple-choice form. 


moon summer winter. 
Ex, 28, The star shines more often in the dark than in summer. 
sun winter the night. 
in Reading Comprehension 


Almost indistinguishable items occur 
tests (cf. p. 96). The same form can b 
Teasoning test. ) 


e used to provide a type of 


and Stephen is 3 years 


; last year. 
Younger than John. Hence Peter is 2 years older foe 
3 es 


Completion items can also be used in the pictorial medium— 


E example, a door with a knob missing: the testee has to draw 
it in, or select it from pictures of other objects. Figures or pat- 
Exs. 27 and 55). Here is a 


tems provide other possibilities (cf. 
numerical example: 


Ex. 29. John is one year older than Peter, 


EX. 30. Fill in + or — signs between these numbers so that they 


i 
8ive the correct answers: 


o 
œo N 
A 
`O 


II 
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ein 
Mixed Sentences. Re-arrangement of the words pk a ee a 
correct order involves grasping its sense. One way of avoiding 


> a: 3 plete 
need for testes to rewrite the sentence is illustrated in Ex. 9. 
is another: 


F ld 
Ex. 31. Underline two words which, if changed around, wou 
make the sentence sensible. 


No sacrifice can be obtained without some object. 


; bl 
Mixed words, i.e. anagrams, are occasionally used, but pror dp 
involve a rather larger specific component than other verba. 


True-False and Absurdities 


Ex. 32. If tomorrow were Monday, yesterday would be Sunday- 
(Underline TRUE or FALSE) 


Ex. 33. ALL SOME NO motor cars travel faster than railway 
trains. 


n 
Ex. 34. The road from my house goes downhill all the way to tow! 
and downhill all th 


€ way home again, (Underline SILLY or SENSIBLE) 
or multiple-choice answers are 
why the statement is absurd, d m- 
Obviously there is little distinction between these an ing 
formation or Always Has items on the one hand, and Reason os 
items on the other. This type of item is very suitable for Ae ; 
presentation, since the response involves a minimum of reading 
Pictorial absurdities are likewise used, inly 
Proverbs. This is a clumsy type of item, but it is Catan 
effective in cliciting comprehension of complex sentene 7) 
Usually a set of English (or ae or imaginary) proverbs has 


be matched with a set of explanations. Thus, in these examples 
is the answer to Ex. 35, and A to Ex. 36, 


Ex. 35. Look before you leap, 


provided, one of which explains 


Ex. 36. Handsome is as handsome does, 


A. Good deeds are more important than good looks. 
B. Don't do anything important without taking care. 


to 
C. Behave well to other people if you want them to behave well 
you. 
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Reasoning. When the testee has to carry out mental manipula- 
tions of the material presented, put forward hypotheses, and check 
them, we may call it reasoning or problem-solving. Most of the 
item-types so far illustrated involve such reasoning when they 
reach a certain level of difficulty; but the following are more 


direct examples. 


Ex. 37. Margaret is shorter than Kathleen, and Kathleen is taller 
than Joan. Who is the tallest? (Margaret Kathleen Joan Can't tell). 


Ex. 38. In a class of 30 children, 20 were boys and 10 were girls. 
Half the class played cricket and the other half played hockey. Did 
any boys play Pockey? (Yes No Can't Say) 

More complex examples may employ formal logical syllogisms, 
as in Valentine’s series for superior adults (1954), or may be 
elaborated into something resembling a short detective story. 
Other problems may be based on spatial orientation. 


Ex. 39. A man started walking west; he turned right, then right 
again, then left. In what direction was he walking now? 


Ordinary problem arithmetic sums also often appear in intelli- 
ence eat a are known to provide one of the best tests of R 
ena i 
It is difficult to find agreement among factorists as to how man: 
reasoning factors should be distinguished, or whether they ara 
€ reduced to Spearman’s g. Nevertheless there is conse le 
evidence for separating deductive reasoning from inductive. The 
former involves thinking out the logical consequences of given 
Premises (as in Exs. 37 and 38), whereas the latter depends fe ee 
discovery of an underlying principle (as in Exs. 46 to $ 5). On the 
other hand, the difference may be merely that deduction a a 
almost always verbal, whereas induction ones are usually te 
symbolic or numerical, Other tests which tend to fall under 3 
deductive heading are Pedigrees, Codes or Substitution, an 


anguage translations. 
1 Pedigrees, a complex 
family is shown and severa 
Ex. 40, What relation is John to William? (brother cousin uncle 
Stepbrother brother-in-law) 
Lr. —6 : 


} 


i fa 
chart of 3 or more generations O. 
] questions asked on it of the kind: 
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Codes take various forms: 


Ex. 41. A boy made up a code in which LONDON was written 
JMLBML. What did the code letters UCCI stand for? 


(This would appear to be as much inductive as deductive.) 


= 10, 
Ex. 42. In another code, A= 2, B= 4, C= 6, D=8,E=! 
etc. What is the code for Z? 


t 
Ex. 43. In the same code, if you subtract C from F and add B, wha 
letter would be the answer? 


Digit-Symbol Substitution is another simple type of coding. 
Ex. 44. 


Write the correct nu 
possible in 2 minutes. 


Bee OX E 


Possibly this should be classified as a perceptual speed test, but 
it has bee 


s : > fi 
n included in several non-verbal intelligence tests G 
P- 59). 


mber under each symbol, doing as many 3 


Translations 


Ex. 45. The Sanskrit sentence: 


KAMALA MONOHARAM means, in English, ‘A lovely ieee is 
TADAGE VARTATI KAMALAM means, in English, ‘The lot 
in the pond’, 


What is the Sanskrit word for 


i ‘lotus’ ...2 
What is the Sanskrit word for 


‘lovely’ ...2 


f 
Induction Tests. The most widely used of these is the Numbe 


Da ter 
Series (cf. Ex. 8). The same principle can be embodied in Let 
Series and Figure Series. 


Ex. 46.b a dcfe 


t 
... Write, or underline, the letter ha 
comes next (f g h i). 
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Ex. 47. 


beko kok beko k | TT | 


Continue the pattern. 


S E, Cue a6) 


Ex. 48. $ Underline one of these which 
would fill the gap. 


These five figures could be re- 
N arranged in a series; underline 
the middle one of the series. 


The same item-forms can be used with pictures (e.g. stages in a 
boy’s getting up and going to school), or—less readily—wi 
words, 


Ex. 50. Summer New Year Easter Christmas November 
(Underline the middle item, or the first and last items). 


The Abstraction item-form, introduced by Shipley (1940), is 
Particularly appropriate for symbolic materials. It can also cover 
analogies, codes and other problems with the same very simple 


Set of instructions: 


_ Wherever you see an asterisk, one letter or number is 
in the missing letters or numbers. 


missing ; write 


A Ra vy Pe BCMA BER ie EER 


52. luck lick lack foul foil bol tit ts Pad 
53. quay key owe oh son * * * 


54. England 1234526 France 785291 | Grecce Xk Aik AA 
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Finally we have the complex two-dimensional series item, as 
represented by Raven’s Progressive Matrices test. 


Ex. 55. 


ES xe 3 [O> 
Fic. 3 Specimen It i i i i fena 
p em as in the Progressive Matrices Test, Which © 


Numbered Pieces 1-8, if fitted into the empty space above, would complete 
the pattern? 


TESTS INVOLVING OTHER FACTORS 


All the above items can be regarded chiefly as measures of g and 
V, though some factor analysts would classify several of them 
under various reasoning factors. Those to be mentioned now may 
also be found in published intelligence tests, but tend to show 
clearer evidence ofdistinctive factors, particularly among relatively 
homogeneous groups, such as grammar school pupils or university 
students. ; 


N or number factor is evident in any numerical material (eg 
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Number Series), but chiefly occurs in simple Four Rules sums 
which have much lower g or V saturations. 
k or S tests. Spatial factors do not necessarily play a large part 
in all tests based on figural materials, nor in orientation prob- 
lems (Ex. 39). Many testes, for example, may solve Matrices items 
largely by verbal, logical reasoning. It is the mental manipulation 
of shapes, or visualising them as turned around, recombined, etc., 
that seems to be the essence of k. Three examples, the Paper ar 
Formboard, ‘Squares’ and ‘Figure Construction’ are illustrated by 7 


Vernon and Parry (1949). Another follows: 


Ex. 56. Which of these four is the same as the first shape turned 
around? 


57. Which is the same as the first shape turned over, or as seen 
in a mirror? 


58. Which is the same as the first shape turned over and 


a “Fad 


Block-counting was used in Army Beta, the American Army a) 
General Classification Test of World 


War II, and other batteries. 


Ex. s9. Count the total number of 
blocks in the pile, including those you 
Cannot see, 


60. How many blocks in the 
pile touch at least 5 other blocks full 
ce? 
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Perceptual Speed or Clerical Factor. When the testee is merely 
required to make comparisons at speed and pick out a shape, word 
or number which is identical with the initial one, a somewhat 
different factor enters. The g-saturation is considerably reduced, 
and such tests correlate well with capacity for clerical work. The 
more general problem of the speed factor in intelligence tests 1s 
discussed later (p. 181). s 

Fluency Tests. These are supposed to measure speed and rich- 
ness of mental associations rather than understanding of ideas, 
though in fact they overlap rather closely with vocabulary and 
other V tests in unselected groups (cf. Rogers, 1953). 


Ex. 61. Write down as many names of birds (flowers, etc.) as you 
can in 3 minutes. 


62. Write down words beginning with F, or ending in -tion, 
etc. 


Other fluency tests are based on associations with inkblots of 
pictures, 

Flexibility-rigidity in the Formation of New Concepts. Luchins 
(1947) has proposed certain tests in which a standard ‘set’, or metho 
of tackling a series of problems, is established; then a new type ° 
problem is introduced, and the tendency to perseverate with tne 
standard method gives a measure of inflexibility. This construct 
plays an important part in psychopathological work where, t0F 
example, it is claimed that brain-inj ured, senile or some psychotic 


patients are differentiated from normals more by their inability 


to form new concepts than by low intelligence as convention” y 
tested. Similarities or 


classification tests (particularly non-verbal), 
Kohs Blocks, and Proverbs have oe eles in this con” 
text, and Wechsler denotes several sub-tests in his scale as de“ 
terlorating strongly with age (cf. P- 172). Similarly, Shipley (1940) 
contrasted Abstraction witly Vocabulary items. It is not yet ¢ ear 
whether this distinction amounts to anything more than the dis- 
tinction between g and v factors. Also it is doubtful whether the 
Luchins type of test brings in any new ability. However, aconceP i 
formation factor has been demonstrated by Lovell (1955) in 20°" 
verbal classification and in what are called sorting tests. 
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_Ex, 63. Sort these pieces into three boxes so that in each box the 
pieces are all of the same kind. 


Bb E 


3 


jz 
ZA 


Nos. of pieces Description of Category 


Box. 3. 
Now sort the pieces again into three 


Box da seattle 
etc. 
Creativity and Originality. An early 
such questions as: 


form of test consisted ot 


world suddenly went blind, write out as 


Ex, 64. If everyone in the ‘ 
Je that would be likely to occur in our 


many consequences as possib 
ae, of living, H 
These are difficult to score; but by compiling a list of all the 
suggestions put forward by a large number of testees it is possible 
to mark any subsequent set of answers objectively for cleverness 
and unusualness. Guilford has devised several other such tests 
which yield a distinctive factor, and which appear to predict 
aptitude for creative work (cf. Wilson, 1954). These include 
inking of unusual uses for, ¢.g+ 2 brick or a newspaper; listing 


things that are impossible; giving unusual or original word or 


inkblot associations. ` 
, Judgment or Evaluative Abilities. Guilford applies the term 
evaluative’ to judging the validity of logical trams of thought. 
But if there is a distinctive factor (or factors) it 1s probably 

the basis of ‘practical 


involyed more in reaching conclusions on 
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feasibility, experience and social custom’ rather than of logical 
necessity, as in the following item: 


Ex. 65. Mrs. Williams picked up her raincoat, but after opening the 
front door to go, turned back and left the raincoat behind, because: 
it was too heavy for her 
she thought it would not rain 
it would hide her new dress, if she wore it 
her husband did not like her wearing a raincoat 
her raincoat was rather too old to wear. 


Thorndike suggested that ‘social intelligence’ could be dis- 
tinguished from ‘verbal’ and “practical intelligence’. Others have 
devised tests of critical or rational thinking, for example, judging 
whether a writer’s arguments are logical or biased and emotiona 
(cf. Watson and Glaser, 1942). As yet there is little evidence that 
such tests measure anything different from ordinary reading 
comprehension, a 

Finally, there are a number of tests in the fields of artistic, 
literary or musical evaluation, where the testee has to discriminate 
between examples of good and bad art, poetry, etc. Though suc 
abilities probably depend largely on g + v + attainment in these 


elds, they may well bring in additional factors of artistic taste 
(cf. Dewar, 1938) 


SURVEY OF GROUP TESTS IN COMMON USE 
_A full list of tests published in Britain in 1956, with age range 
times, publishers and Prices is given elsewhere (Vernon, 19. 56); 
Buros’s Yearbooks (1959) provide details and critical reviews © 
American and many English tests, Here we will merely draw 
attention to the main features of some of the most commonly 
used intelligence tests, 
Oral Verbal Tests. There are two promising modern tests— 
Cornwell s Group Test of Intelligence for Juniors (1952) and Tom- 
linson’s Junior School Test (1953); both have deviation 1.Q. norms. 
They take several school Periods to administer, and Tomlinson 
in particular, depends largely on the capacity to read wot 
written on the blackboard. There is room for a shorter, simp. A 
test which would be appropriate for first- and second-year juniors; 
for example, Ballard’s Group Test Sor Juniors (1922) might be re 
vised, shortened and renormed, 


Oral Pictorial and Non-verbal, The Sleight Non-Verbal Intelligent 


- education authorities—prefer a stan 
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Test (1931) includes a good range of pictorial and figural items, 
and has been one of the most widely used junior school tests for 
many years. It provides M.A. norms only, whereas Mellone’s 
Moray House Picture Test (1944) yields deviation I.Q.s. As 
already mentioned, tests of this type are unlikely to correlate 
anything like as highly with educational capacities as verbal tests; 
and they should not be thought of as giving a better measure of 
inborn ability, freed from environmental influence. 

Wide-range Verbal Tests. Burt's Northumberland Standardised Test 
No. 3 (1925), Richardson’s Simplex Junior (1932), and Schonell and 
Adams’s Essential Intelligence Tests (1940) all embody a variety of 
the verbal item-types listed above. Their Mental Age norms run 
from 6 or 7 up to 16, but this means that their effective use is much 
more restricted, since they will not discriminate among dull 
children much below 9, nor bright ones above 12. Their Standard 
Deviations are fairly high, so that their I.Q.s are not compatable 
with WISC or Moray House I.Q.s; and, in these days of wide- 
spread coaching and familiarity with tests, their norms are some 
5 to Io points too generous. This is unfortunate, since they are 
among the most accessible and the easiest for teachers to apply in 
their own classes. r 

Verbal Tests for the Selection Examination. The tests issued 


annually for the 11-plus examination by Moray House and by the 
weet eda tae earch are technically 


National Foundation for Educational Res 

among the most sophisticated anywhere in the world. They pro- 
vide highly reliable deviation I.Q.s over a narrow age-range (say, 
10‘0-12'0) which are comparable, to within 1 or 2 points, from 
year to year; (complete comparability for older versions is im- 
possible because of the rather general upward trend since World 
War II; cf, Pilliner, 1960). The types of items are somewhat nar- 
row because of the needs for casily intelligible instructions in 
omnibus tests and for easy scoring; also because the users—local 
dard instrument. ie more 
variety is apparent in recent years. A further interesting teature 15 
that they i not claim to be ‘intelligence tests’; the National 
Foundation has always referred to its series as Verbal Tests, and 
Moray House has now adopted the nomenclature Verbal Reason- 
ing Tests. These tests are—wisely—not on general sale, though 
older versions may be released to research workers or other 


authorised persons. 


90 INTELLIGENCE AND ATTAINMENT TESTS 


Non-verbal Group Tests for Children and Adults. The Jenkins Non- 
verbal Scale of Mental Ability (1947), and a parallel form by Lee 
and Jenkins, provide reliable g-tests for ro- and 11-year olds 
with a minimum of verbal and spatial content. Their correlations 
with achievement in a grammar school are rather low, but they 
probably constitute better predictors of mathematical, scientific 
and technical aptitude than do verbal tests. 

The National Foundation for Educational Research issues other 
non-verbal tests, together with a series containing mixed verba 
and non-verbal sub-tests. Heim’s Test AH4, for adolescents an 
normal adults, is also a combined test, probably having fairly 
strong number and spatial loadings, and is therefore useful for 
vocational purposes. Earle (1948) has published a series of Duplex 
Tests for 10-, 11-, 12- and 13-year olds respectively, containing 
a variety of verbal, numerical, spatial and mechanical items. Their 
aim is to show differential suitability for different types of second- 
ary course—academic, technical, commercial, etc.—though they 
also yield a total result which can be converted to an LQ. | 

The most widely used non-verbal test is Raven’s Progressive 
Matrices (1938, 1947), of which there are several forms. Sets A, A 
and B are coloured, and are intended primarily for individua 
application to children of 5 years upwards or to senile patients. 
The 1938 series can be given in group form from 10 to a 
levels, while the 1947 Sets I and II apply mainly to older children 
and superior adults. The tests are very easy to explain and 207 
minister and are usually given without time limit. Thus they a£? 
suitable for, say, mental hospital nurses to apply to defective OF 
neurotic patients. The 1938 series was used (with a 20-minute 
time limit) with almost all naval and army recruits from 194174 

The National Institute of Industrial Psychology’s Group T 
70/23 (1939) may be recommended as a short but effective non- 
verbal test for 14 years up and adults. Finally, R. B. Cattell has 
published parallel forms of Culture-free Intelligence Tests (1949); 
Scale II for children and Scale II for adults. While we would re- 
ject the claim that any tests are culturally neutral, good evidence, 
has been provided by Cattell that such tects give a fairer picture 9 
intelligence level among linguistically handicapped persons sue 
as foreign immigrants than do verbal tests. 

Verbal Tests for Adolescents and Adults. Among the older tests, 
Cattell’s (1935) Intelligence Scales II and I (Forms A and B) ga 
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serve mention as providing very thorough measures of g + v. 
Scale III is one of the most difficult available for superior adults. 
Though carefully standardised (Cattell, 1934), the tests seem to 
yield abnormally high I.Q.s, largely because they have big Stand- 
ard Deviations. ' 

Wiseman’s Manchester General Ability Test, for 13-143-year 
olds, and Thomson’s Moray House Adult Test, for 13}-adult, 
follow the omnibus pattern of the 11-plus Moray House tests 
and provide deviation I.Q. norms. 

The National Institute of Industrial Psychology’s Group Tests 
33 (1923) and Group Test 35 are suitable for grammar school and 
student populations, and have percentile norms. Like the North- 
umberland Test of Burt (who also devised Test 33), they are in 
battery form. Group Test 90 (1948) is a similar test for the normal 
adult range, with norms for age groups from 21 to 60. 

One other high-grade test, suitable only among the top 5 per 
cent of the population, is Heim’s AHs, with verbal and non- 
verbal omnibus sections. Finally, there are a large number of tests 
devised by psychologists in the Armed Services and the Civil 
Service Commission for recruits, officers, clerks, administrators, 
etc., which are not, of course, publicly available. In general, they 
are chosen more for their relevance to a range of occupations than 
as tests of intelligence or other mental qualities or factors. 


Chapter Six 
EDUCATIONAL ATTAINMENT TESTS 


Ir is so convenient to be able to state that a pupil’s reading i 
equal to that of the average 8-year old, or that he ar A a 
metic Quotient of 120, that we are apt to ignore the diffic Handle 
educational measurement. What kind of reading or ari PEE i 
has been tested? With many children there may be consider a 
unevenness in different aspects of a subject, and the test Het 
may not cover those aspects that the particular teacher, or mo The 
educational Opinion in general, regards as most important. on 
syllabus and the methods of teaching are likely to vary pane ae 
say, Scotland, Hampstead and Pembrokeshire so that it ie Hie 
likely that any one test is appropriate in all these areas. Alt ae 
tively, the child’s teachers—knowing the content of the te ff 
may have coached him and artificially raised his performance o 
those aspects without having improved his all-round pie 
in the subject, Though it may sound heretical for a psychome! a 
to say so, the writer would hold that a child’s score on a ua 3 
40-minute test may often provide a less representative pictur is 
his attainment than the impressions of the teacher who has } 
served the whole of his work Over several months. sents 
The apparent accuracy of educational age units, and of quoti oe 
derived therefrom, is also misleading. They assume that the ieee 
age child gains an equivalent amount each year in each ae a 
tested, though this obviously breaks down at the time © ie 
‘push’ for the 11-plus examinations and during the subsequ 


$ at 
years in a secondary modern school. And they assume thé 


standards remain constant, although some tests which are still 
common use date back 


a 
to 40 years, and there is clear proof o lly 
widespread decline during World War II which has only par the 
been made up (cf. P- 163). Nor is it very meaningful to app. A ee 
same norms in a rural area, an industrial slum and a as ists 
class suburb. As suggested elsewhere, teachers and psycho a 
who use tests for educational guidance would do better to ey 
up their own local percenti 


le norms for the type of schools 
are concerned with, and to 


D pan 
tevise these periodically, rather 
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rely i what are probably outdated national age norms (Vernon, 
1956). 

However, provided their limitations are borne in mind, attain- 
ment tests can admirably fulfil certain purposes. Group tests, such 
as those used at 11-plus, can give a fairly thorough, though not 
completely comprehensive, sampling of the skills which are 
regarded as important by most primary and secondary schools. 
Thus they are particularly useful for large-scale selection purposes, 
or for surveys of schools or areas, where it would be almost im- 
possible to standardise the marking of more conventional, but 
subjective, examinations. Similar tests of more specific objectives 
are practically indispensable for investigations into teaching 
methods, For example, ifan educationist wishes to study scientific- 
ally the advantages of television lessons, he should devise tests 
which will pinpoint the particular abilities that televised (and non- 
televised) instruction would be expected to develop. Many such 
tests are embodied in student research theses, but are unpublished. 
Note that for none of the purposes so far described does the prob- 
lem of norms arise. In selection, Educational Quotients (corrected 
for age) are usually worked out by the same technique as that 
used for deviation I.Q.s (p. 50); but the crucial standard or 
borderline is generally set by the number of grammar-school 
places available. In surveys or experimental research, meaningful 
comparisons can be made between the scores of various groups 
without resort to age units or quotients. 

However, let us ais discuss oe used in educational guidance, 
whose aim is to assess the performance of individuals, or school 
classes, with reference to some set of wider, external standards, and 
thereby to judge whether children are failing to make reasonable 
educational progress, and if so what educational treatment is 
called for. f | 

Individual Reading Tests. Though all aspects of aieo ap 
greatly, there is general agreement—backed up by factoria 


ies— E >ø ability, reading speed and com- 
Se eet eerie ioe atively distinguishable, 


tested by Graded Vocabulary 
from those that can bs 
read by 5- to 6-year olds up to those likely to be known only by 
eyed olds eas Burt (1921), Vernon (1938) and Schonell 
(1945) have issued similar tests o 


a 
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on up to higher age levels than the others, but its Reading Ages 
from 15 to 21 represent purely arbitrary units for expressing 
superior skill among older pupils or adults; and it is doubtful 
whether pronouncing ability is of much educational importance 
among older grammar-school pupils. Schonell’s test, being the 
most recent, is likely to be the most suitable, in England at least. 
The norms for these tests differ, though not markedly, and a 
conversion table is available (Ministry of Education, 1950). The 
tests are easily administered and scored, in a few minutes per 
child, even by relatively inexperienced teachers; and they provide 
useful opportunities for noting types of error. 

Ballard (1923) and Burt supply lists of monosyllables, from the 
number of which read aloud in one minute, reading speed can 
be gauged. But speed of reading continuous material is more 
important and this is included, along with measures of accuracy, 
in Burt's, Schonell’s (t950) and Neale’s (1958) prose-reading tests- 
In Burt's ‘King of the Golden River’ and Schonell’s ‘My Dog» 
time and errors are recorded, and thereafter a series of questions 
are asked to test understanding. Neither of these is likely to be 
sufficiently reliable to be worthwhile, and Watts’s (1944) plan, 1 
his Holborn Reading Scale for 63-104 year children, of having 4 
series of sentences graded in difficulty of pronunciation and © 
comprehension is superior. Unfortunately, Watts provides 2° 
comprehension norms; he seems to assume that Gilden should 
be able to understand what they can pronounce. Much the best 
test of this type, over the Junior school range, is Neale’s Analysts 
of Reading Ability, which provides three parallel sets of passages 
and up-to-date norms for the three aspects we have mentioned. 
Neale also allows for recording and tabulation of various types & 

I a series of supplementary (un-norme 
diagnostic tests for very backward ae nee to bring 
difficulty; these include visual discrim- 
Knowledge of phonics, and tendency 


Daniels and Diack’s (1958) Standard Test of Reading Skill, like 
the Holborn Scale, grades reading accuracy (though not com” 
prehension) from 5 to 9 by means of sentences. It is supple 
mented by a large number of un-normed diagnostic tests, © 


visual and auditory discrimination, letter and word recognitio™ 
etc, 
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Vocabulary is usually tested as part of the Terman-Merrill or 
Wechsler intelligence scales, However, Raven’s (1943) Crichton 
and Mill Hill vocabulary scales cover oral vocabulary from 4 
years up to adult. In addition, there are multiple-choice versions, 
from 11 up, which can be given in written group form. 

Spelling. While an individual reading or oral vocabulary test 
provides useful diagnostic information besides the quantitative 
score, there is little point in testing spelling or arithmetic indi- 
vidually, However, the following graded tests are more con- 
venient for individual administration, since the child can be 
started at a level where he is likely to manage all the items (as in 
Binet-testing), and be stopped when he has reached his limit, 
whereas a much wider range of words would need to be given to 
cover the varied spelling abilities of a whole school class. 

Burt’s Graded Vocabulary Test provides 10 words for each year 
from 5 to 15. It is likely that both the norms, and the order of 
difficulty of the words, have altered since it was published in 1921. 
Schonell has two similar lists which are also quite old; standards 
clined during World War II. Elsewhere (1932) 
he provides short graded tests of irregular and regular words, and 
supplementary individual diagnostic tests of visual and auditory 
recall (1942). Daniels and Diack’s (1958) Graded Spelling Test, 
though consisting of 40 words only, probably gives adequate 
coverage of attainment in the junior sc ool, ‘ 

Arithmetic. The simplest tests for 6-8 year olds are Ballard’s 
(1923) One Minute Addition and Subtraction, which consist of lists 
of combinations (2 plus 3, 6 plus 1... 5 take away 2, etc.) to be 
done at speed. There are, however, no reliable norms except those 
published by Thomson and Lawley (1942) for 7-year olds. Burt's 
Graded Oral Test, containing 10 problems at each year from 5 to 
15, is rather lengthy; but there is an abbreviated version mat 4 
problems a year (Burt, 19554). Whether the order of difficulty 


and the norms still hold good is not known. Vernon’s Graded 


Arithmetic-Mathematics Test (1949) is a written 20-minute group 


test, but it can generally be given individually in 10-15 minutes. 
There are 5 questions at each age level from 6 up to 21 (the units 
from 15 up being arbitrary). The upper-level items involve 


knowledge of secondary school mathematics: 


may well have de 


Ex. 66. If log. 90 = 1:9542, what is log. 3? 
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GROUP TESTS 

Many of the following are also often useful for guidance 
purposes. A 

Group Comprehension Tests. Once a child has crossed the initia 
hurdles in reading (usually at a Mental Age of about 8), and can 
recognise any new regular words for himself, his capacity tO 
understand what he reads silently is what we chiefly wish to 
know. Reading comprehension tests at all levels from 7 or 8 to 
superior adult follow the same plan of presenting passages O. 
prose; after reading each the testee has to answer one or two, OF 
several, multiple-choice questions based on its content. These 
cover a variety of features—the extraction of specific information, 
determining the meaning of difficult words in their context 
making inferences from the material or applying what has been 
read to fresh problems, evaluating the prejudices, mood or tone 
of the author, and so forth, Many investigations indicate that 1t 15 
very difficult to disentangle such aspects of reading skill; whatever 
the ‘faculty’ or type of ability aimed at, they all measure much se 
same general Capacity. Even Sentence Completion tests (similar 3 
Ebbinghaus’s test of intelligence, or Ex. 28) cover much the g 
ground as tests involving lengthy paragraphs, and mere size ©; 
vocabulary overlaps very highly despite its apparent ‘lower orac! 
of skill. Probably the subject-matter of the passage has mo 
influence, at least in advanced tests; different testees vary accor®- 
ing as itis literary, technical-scien tific, philosophical, historical, et 

In consequence, it is very difficult to construct reliable tests © 

S type without extensive trials of items, and they have to j 
quite lengthy to be of much use; (a typical American test °° 
secondary pupils or students would contain perhaps 8 ea 
and 40 questions to be answered in 40 minutes). Thus ere a 
few, if any, adequate examples in England. Lambert (1952), 3 
published one for 7-year olds, lasting about 75 minutes. Hig is 
field’s Kingston Test (1954), with creative written responses, 
suitable for o-year olds. Schonell’s (r950) Silent Reading Test a 
and B are widely used, but are effective only over the 84- to 1 
year range, and are too short for adequate reliability. tain 

Most standardised tests of English e 11-plus selection co” of 
reading comprehension sections, but also have other type o 
item (see below) to compensate for their weaknesses. There are ™ 


h 
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paragraph reading tests in this country for secondary pupils or 
students, though experimental ones have been constructed by 
Black (1954) and others. A sentence completion test by Watts and 
Vernon, covering reading ability from about 7-year to adult 
levels, has been widely used in Ministry of Education surveys of 
literacy (1950, 1957). This is not published; but Watts’s similar 
Sentence Reading Test 1 (1958) and Daniels and Diack’s Graded 
Test of Reading Experience (1958) are available for junior children; 
and another Be by Vernon and Warden for 10-year to adult 
levels may be borrowed from the present writer by qualified 
persons. The following illustrates the kind of item: 

Ex. 67. A highly cultured environment often boosts up the academic 
career of an otherwise (practical, absent, mediocre, brilliant, educated) 
student, 

Spelling Tests. The dictated word test has the defect of depend- 
ing greatly on the tester’s clarity of enunciation and the pupils 
familiarity with his pronunciation. This is partly, but not com- 
pletely, overcome by saying each word in the context of a sen- 
tence: 

Ex, 68. SEARCH—Search for the ball till you find it. 

Plus Assessment ensures that children 


Lambert’s (1951) Seven- ; l 1 
know what words to write by means of pictures or stories; it 


takes approximately one hour. ies 
American psychologists, however, prefer objective tests to 


dictation tests. These take various forms: one 
(a) Right-wrong—a series of words, those that are mis-spelt to 


be underlined, 
(b) Multiple-choice. 


Ex. 69. LOOK FOR serch scearch sarch ; 
sersh search 


(c) Skeleton word. 

Ex. 70. LOOK FOR s... ch 

(d) One word in a sentence is mis-spelt; 
underlines, then rewrites it. 

Ex. 71. Serch for the purse in the field. 


i h the same 
Nisbet (x found that these, all measure much t r 
ability as Tas tests, but Cook’s (1932) more extensive series 


LT.—7 


the testee finds and 


a“ 
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o 
of comparisons indicated that the last type come yaa 
measuring the accuracy with which pupils will spe i «hie 
compositions, i.e. in a situation where they do not kr CRI Wee 
words need special care. Though objective spelling iten ihe 
occur in tests of English at 11-plus, and several other ore m 
are used by the Services, there are no published, standardis 
for British children. } 

Other English Skills. Schonell’s Diagnostic English Test (1950) 
contains five sub-tests with separate age norms from 8 to I 7 io 

I. Usage, including creative and choice-response items 
following types: 


Ex. 72.Wew... eating our dinner. 


Ex. 73. Yesterday he ae me a ball to play with. 


Deepal gs > run : marks 
Il. Capitalisation and punctuation. Missing punctuation 
or wrong letters to be written in: 


Ex. 74. joan asked are you going to scotland for your holidays 


itute 

This is troublesome to score, and it is more usual to substitu! 
multiple-choice form, 

IIL. Vocabulary—multiple-choice. s are 

IV. Sentence structure, Sets of two to four short sentence nce: 
given, to be re-ordered and combined into one complete sente! 

V. Composition (cf, below). l older 

Burt’s (1925) Northumberland Standardised Test 2 is “pulary» 
battery which likewise supplies separate norms for voca 
reading, spelling, also geography and history. neept 

A number of unconventional tests of vocabulary, CO 


-hool 
development and other aspects of language (rather than of sch 
attainment) are given in 


ent 
Watts’s Language and Mental Develop" 
of Children (1944). 


sonal 

English Tests in Selection at 11-plus. Moray House and Dee 

Foundation tests almost always take the omnibus form, Y utes: 

hundred or more questions to be attempted in 40 or 45 Bate of 

They are standardised like intelligence tests to a mean quo oup: 

100 and Standard Deviation of 15 in each monthly age BAY, 
Reading comprehension, vocabulary and usage figure promin 


ks O: 
and sometimes spelling and punctuation. Other short bloc 
items may include the following: 
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Finding Rhymes. 
Ex. 75. PLOUGH rhymes with (cough, dough, cow, snuff, tough). 


Explaining metaphors, e.g. playing with fire; choice-responses 
are given. 

Choosing suitable titles for short poems. 

Forming plurals, opposites, tenses, parts of speech, etc. 


Ex. 76. AUDACITY. His behaviour was very ...... 


The National Foundation issues similar tests at a more ad- 
vanced level, for 12- to 13-year pupils. Q 

English tests of this kind are highly reliable, and easy to give and 
score, but they have aroused widespread criticism because—it is 
said—they require nothing but ticking and underlining, and there- 
fore fail to measure a child’s ability to express himself in writing. 
Though the argument is exaggerated—for at this level multiple- 
choice and creative tests measure very nearly the same thing—it 
is true that objective tests have had a bad effect on teaching in 
many junior schools. Children are trained at these items, since 
coaching probably pays considerable dividends, and get insufficient 
experience of free writing. The National Foundation for Educa- 
tional Research, C. M. Fleming, and others have therefore devised 
tests which do involve creative responses but which are so formu- 
lated that there is very little room for subjectivity in scoring (cf. 
Vernon, 1957). The scoring is, of course, more onerous, but this 
seems worthwhile if the tests have better pedagogical effects. 
Some examples follow. 

Sentence completion with single words. 


Ex. 77. A tricycle has three.....- 

Writing a grammatical ending to a sentence. 

Ex, 78. I told him that he ought to.....-+++-+-+++++> 
Writing a suitable beginning to a sentence. 


eho nbag ace as it was beginning to rain. 


Rephrasing sentences. 
Ex. 80. Jim stole the jam from the larder. 
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Reported speech. 


Ex. 81. “May I go home now?” said Mary. 
Mary asked 


Writing a complete sentence in answer to a question. 
Ex. 82. Why are you late for school, Jobn? syete elare e ane ai a AR 


Sra veoh. Hae home now. 


Tests used for selection at 11-plus are not published, for om 
vious reasons; but the National Foundation has issued five Eng 


Progress Tests (for 8+-, 10+, I, 12+, 13+), wholly or mainly 


i e 
‘ cen shown that Scottish pupils, at least, SEA 
11-12, write at only one-third the speed of their American coun 


parts (Smith, 1951). The quality of handwriting or of composition 
cannot, of course, b 


product scale technique ; 


eci- 


standard specimen, and his product is compared with the sP sie 


mens until one is found 
round quality. ma 
pirg quality of handwriting scale is still the best availab £ 


on a variety of topics (1942, 1950)- bs 
Arithmetic. Burt’s Written Graded A Mechan Teal Senne (Test ee 
Four Rules (Tests XXIV) and Written Graded Problems (Test qtia 
also Highfield’s Southend Test and Schonell’s (1950) Foe 2 
Mechanical and Problem Arithmetic Tests (two forms) all pro 
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age norms from about 7 to 14 or 15. It is not known whether 
these norms still hold after the decline during World War II. The 
National Foundation issues several well-standardised Mechanical 
Arithmetic tests for 8-9 year olds. 

Most group tests, either for selection at 11-plus or for wider 
age ranges, contain two sections, each of 20 to 25 minutes, one 
covering mechanical, the other problem, arithmetic. Examples 
include Lambert’s Seven Plus Assessment (with 20 minutes for each 
of the Four Rules), Fleming’s Cotswold and Kelvin tests, the 
National Foundation’s Arithmetic Progress Test C, and the annual 
Moray House or National Foundation arithmetic tests. More 
advanced is the National Foundation’s Mathematics Test I. 

These tests, too, are often criticised in that they stress working at 
speed, and give insufficient scope for ingenuity and application to 
long problems. But any assessment of such qualities would be 
likely to bring in subjective judgment. For the diagnosis of back- 
wardness and for guidance purposes, tests which break down 
arithmetic into separate topics are more useful (cf. Schonell, 
1957). Burt’s Northumberland Standardised Test No. 1 (1925) has 
seven sections, and could be very useful if the norms were brought 
up to date. Schonell’s Diagnostic Arithmetic Tests (1950) cover all 
the fundamental combinations, graded sums in the Four Rules, 
and mental arithmetic. For maximum information they should 
be given untimed, spread over several periods. However, they can 
be applied with time limits totalling some 80 minutes, and norms 
are provided for each sub-test from 7 to 15 years.  — 

For a relatively quick survey of attainment either in junio 
in secondary schools, Vernon’s Arithmetic-Mathematics Test (al- 
teady described) can be given in group form with a 20-minute 
limit. The only detailed test of grammar school mathematics 
published in this country is Walton’s Geometry Attainment Test 
(1948) for pupils aged 12-16+-. 

Other School Subjects. Apart fro 
example, in research investigations, but not standardised or pub- 
lished, there are scarcely any tests in languages, social studies, 
science or other subjects. Tests of French vocabulary and gram- 
mar, by Percival (1951), are available for 1st to sth year grammar 
pupils; and Cohen’s (1949) somewhat more extensive French 
tests, standardised on rst to 3rd year Australian pupils, are some- 
times used. But we must admit a marked contrast with America, 


r Or 


m experimental tests used, for 
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where numerous tests in every school subject are available for any 
~ level up to 4th year university. The difference arises partly be- 
cause secondary school and university syllabuses are so varied that 
there is little common ground on which to base tests. But also, of 
course, our secondary grammar schools are geared to training 
pupils for conventional examinations, and see little point m 
assessing their attainment along various lines with objective tests 
which, by their very nature, cannot measure quite the same 
abilities as do G.C.E, or university scholarship examinations. 


Chapter Seven 


THE INTERPRETATION OF INTELLIGENCE 
AND EDUCATIONAL TEST RESULTS 


INEVITABLY this chapter is rather more technical than the rest. 
But no one should be encouraged to apply tests, to make use of 
their results or to criticise them, unless he is willing to acquaint 
himself with certain basic psychometric principles. 
UNITS OF MENTAL GROWTH 

The first problem to be faced has already been touched on 
briefly, namely the irregularities in Mental and Educational Age 
units (p. 92). In measuring lengths with a ruler, the distance from 
12 to 13 inches is the same as that from 2 to 3 inches. But it is most 
` unlikely that mental growth from 12 to 13 years is the same as 
that from 2 to 3. The traditional method of scoring intelligence 
tests in Mental Age units and calculating Intelligence Quotients, 
however, makes just this assumption. If a person of average 
intelligence is to obtain an I.Q. of 100 at every age, the graph of 
mental growth must take the form shown in Fig. 4a. But no 


psychologist believes that growth in ability is linear throughout 


childhood, or that it stops abruptly at 15 or any other age. There 
is no really watertight method of determining the true course of 
development, and the attempts made by Thurstone, Dearborn, 
Heinis and others are somewhat discrepant. But Fig. 4b is certainly 
a nearer approximation. The first few years are particularly un- 
certain, but from 5 to 10 there is a period when growth can 
reasonably be regarded as roughly linear; then a slowing down 
and later a decline. No one individual, of course, conforms closely 
to this average. The slope is steeper for bright and gentler for 
dull children; in addition, each child is likely to show irregulari- 
ties, spurts and plateaus, depending on intellectual stimulation 
and experience at home and at school, and on emotional security 
or instability. y ; 
We have scen also that intelligence is not a single, unitary 
entity; it comprises a host of overlapping functions which doubt- 
less develop at different rates at different times. Hence any curve of 


DT DES TS 
04. INTELLIGENCE AND ATTAINMEN 
I 


INTELLIGENCE 
INTELLIGENCE 


10 is 20yrs o 
AGE 


FIG. 4a 


AGE 
Fic. 4b 
THE GROWTH OF INTELLIGENCE 


d. 
¥ > ts use 
pment will vary more or less ee ees the 
The age at which growth ceases likewise varies wi ted that duller 
individual and his environment, It is sometimes sta bright ones, 
children reach their maximum at an earlier age fan as will be 
ut the evidence for this is dubious. More probab ys Eel and 

in Chap ine, this age depends on the the intellect- 
quality of schoolin; i e adolescent, or 


BE units likewise ya 


itish 
ry in size. Among Br 
est increment tend. 


e 

of th 

s to occur in the final paar in 

is there is little further amp af cole! in 
In Pupil. But this Mad ae T arithmetic, 

ifferent schools and in different subjects. Mechanica 

OF example, tends to dro 


ited 5 to 19 agc-rang 
lated fro age scores by formul 
will inevitab how fluct 

w. 


INTERPRETATION OF TESTS RESULTS IOS 


stitute percentile or standard score norms for each separate age- 
group. Unfortunately, the age system has become too widely 
known to, and accepted by, school teachers, medical officers and 
others concerned with children’s development to be discarded 
easily, and it is undeniably convenient over the primary school 
range. But even if we have to continue to use it for many years 
to come, we should be aware of its defects. 


THE NORMAL DISTRIBUTION OF ABILITIES 


We must consider next the implications of a very fundamental 
principle in mental testing—the tendency for human abilities to 
be normally distributed. If a large, unselected group of children 
or adults is tested, and the numbers obtaining each score, or I.Q. 
or E.Q. are plotted, the graph usually approximates to the sym- 
metrical, bell-shaped curve known as the normal or Gaussian 
distribution. The largest proportion score near to the average or 
mean, and there are fewer and fewer as we proceed to either ex- 
treme. In practice, the graph is always somewhat irregular unless 
the numbers are very large; and if the group has been creamed, 
or in other ways selected, it will, of course, tend to become 
skewed rather than symmetrical.1 But in any large, unselected 
group, divergence from normality arises chiefly because the 
units of measurement are unequal. Suppose, for example, a test 
contained a lot of easy and medium items, but only a few, 
steeply graded difficult ones, then an otherwise normal distri- 

ution might assume the shape shown in Fig. 5. But if, now, the 
higher units (at the right-hand end) expanded in width and cor- 
respondingly decreased in height, the graph would become sym- 
metrical again. y tei 

Different tests of intelligence generally yield normal distribu- 
tions of 1,Q.s, but their curves often differ in spread or range. 

sychometrists measure this spread by the Standard Deviation 
(S.D.) or sigma (o), which is roughly equal to one-sixth of the 
total range? Thus the Stanford-Binet scale generally gave an LQ. 
distribution with a S.D. of 15. All except a very small proportion 
ofits T.Q\s fell between 145 and 55 (i.e. 100 + 36 and 100 — 30). 
All but the highest and lowest 24 per cent fell between 130 and 70 


1 For a full gE Vernon (1956). , 
2 Itis E by which each score differs from the 
mean, averaging these squares and taking the square root. 
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Fic. § Skewed Distributio 


n obtained when upper items are too steeply graded 
in difficulty. (Dotted Line 


represents Normal Distribution, with Evenly Graded 
Items. 


25 to 30. This aprira Og elese k 
100, and more got ye, high or low ones. Thus, if the same grow! 
of children took th Sunk, a Cattell group test, the 
mean IQ. on both tests might be 100, but some of the Cattell 
1.Q.s might Tange all the way from about 30 to 170. The nerna 
Merrill scale i icularly confusing (cf. p. 55); for while the 
t goes as low as 12 at 6 years, and rises to 4 
Peak years Unh means that an LQ. of sid 
represents very high intelligence (90th percentile) in a 6-year old, 
but only moderately high (93rd percentile) in a 12-year old. 
Similar differences occur between educational tests. For ex- 
ample, Reading Quotients from a Graded Word test tend to show 


ackward in the former may score 3 
} i ter. Yet this is purely a matter of units; 
it does not signify that they 


~ a01y iM spread as between different markers Os 
uctent subjects, For instance, the range of percentage mari i 
arithmetic is almost always much wider than that in Engli 
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examinations. As pointed out elsewhere (Vernon, 1956), such 
marks cannot properly be compared or combined unless the 
S.D.s are adjusted. 
A The origin of such differences is somewhat obscure, though it 
is certain that one major factor is the degree of heterogeneity of 
the items composing the test. In a group test like Cattell’s almost 
all the items are very similar in type, whereas in the Stanford- 
Binet they were quite diverse. In addition, certain abilities seem 
to have an intrinsically smaller spread than others. More concrete 
ones, such as counting pennies and copying a square or diamond, 
spread out the testees less than do reasoning and orientation items, 
which involve the mental organisation of abstractions.t 
Among tests scored by age units such differences in spread are 
unavoidable, and there are no grounds for deciding that any one 
S.D. of I.Q.s is the ‘right’ or ‘true’ one. Thus the important 
practical conclusion follows that the tester should never quote an 
LQ. or E.Q. without mentioning the test used, since the same 
quotient from different tests often represents different levels ot 
ability. To make these comparable they can be adjusted so that 
their S.D.s are equated. For example, Cattell I.Q.s can be made 
approximately equivalent to Binet ones by multiplying any 
deviation from 100 by $3. Thus Cattell 150 = Binet 130, and 
Cattell 75 = Binet 85. Although Stanford-Binet is no longer used, 
it provides a useful standard because we became accustomed, 
uring 1916 to 1937, to regarding I.Q. 70 as the rough borderline 
of mental deficiency, below which fell some 24 per cent of the 
Population. Similarly, a reading or arithmetic quotient of 85, that 
is t S.D. below the mean, has become widely accepted as a 
borderline of educational backwardness, which cuts off the 


Ottom 15 to 16 per cent of the primary school population. Burt 
ES as p ess which fits in with 


provided a useful definition of backwardn 
this: the backward child is one who, in the middle of his school 
Career, at 10-0 years, cannot manage the work of the class a year 
younger. Since this younger class would range in age from 8:5 to 
9:5,our backward 10-year old would fall below average 8°5-year 
olds, and so would obtain a quotient below 85. But we must 
emphasise that the S.D. of 15 is a convention, and that the 
quotients from most age-scored tests yield higher figures. 

* CE. Sare (t951). Other influences, probably of lesser importance, are dis- 
cussed by Cattell and Vernon (1937) 
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The same is not true of percentiles. These are always rectangularly, 
not normally, distributed. For by definition, one-tenth of all 
testees obtain percentiles between the rooth and goth, one-tenth 
between the goth and 8oth, and so on. Hence percentile units are 
very unequal at different parts of the scale. The difference between 
the 95th and 85th or between the sth and 15th percentiles is 
more than twice as big as the difference between the 45th and 
ssth. (That is why Army S.G. units are chosen to cut off 10:20: 
40:20:10 per cent respectively. These percentages correspond, 
rather inaccurately, to equal units along a normal curve.) So 
although percentile norms for tests are rather easily obtained and 
simply interpreted, they are apt to be misleading. For instance, 
one should never add together nor average percentile scores. 


CRITICISM OF THE ASSUMPTION OF 
NORMAL DISTRIBUTION 
admitted that a normal distri- 


In the previous section we have 
f test scores. This had led some 


bution is often imposed on a set o. 
Ctitics to dispute the whole conception of abilities being normally 
distributed. Probably these critics are objecting chiefly to the 
notion that intellectual differences are innate or genetically deter- 
mined. But the normal distribution is not dependent on the 
Conception of heredity. Whether a man’s intelligence level results 
rom the operation of numerous genes or whether, as others 
believe, it results from the effects of factors in his upbringing, it 
Would still be reasonable to expect to find the majority of 
intelligences near the mean and relatively few very high or very 
low ones. We would agree that the range of this distribution may 
well be magnified through the influence of socio-economic an 
educational differences, but not that the tendency to normal shape 
is an artefact. The evidence for this view is as follows: ; 

_ (i) If we take any bodily skill which can be measured in object- 
Ive physical units, such as tapping speed, the distribution is 
r * CE, Simon (1953). One must admit, too, that test-constructors are apt to 
tgue in a circle, in that they claim to find normal distributions from tests into 
which normality has been introduced, as it were, by the back door. If they try 
Out a large batch of items, and select equal numbers of items at each level of 


difficuli 
ty (say, 85 per cent ass-rate, 75-84 per cent, 65-74 per cent... 14 
Per cent and REJ then eee scores on the selected items will inevitably 


tend to be normal. 
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always normal or only slightly skewed. Proceeding to tests that 
involve a stronger admixture of intellectual factors, there are no 
signs of departure from this same shape. Thus there is no reason 
to suppose that purely mental tests, which are scored in psycho- 
logical units, would differ either. 3 

(a) The original Binet-Simon scale was constructed entirely 
without reference to statistical distributions. Yet, over a limited 
age-range at least, the LQ.s it yielded conformed closely to norm- 
ality. Probably there were divergences in the youngest and in the 
oldest age-gtoups because the scale failed to cover the lowest and 
highest levels adequately, and because of the (already admitted) 
irregularities of M.A. units above about 11 years. 

_ Stanford-Binet and Terman-Merrill LQ. distributions do some- 
times depart apere cably from normality (McMeeken, 1939; 
Scottish Council, 1949), but this occurs chiefly at the level where 
irregularity of M.A. units is most likely to arise. In any case, the 
distribution at any one age is so dependent on the difficulties of 
the particular items falling round about that level that the ob- 
served irregularities neither prove, nor disprove, the hypothesis of 
normality, In addition, there seems to be a ‘bump’ at the bottom 
end of the curve, representing the imbeciles and idiots with 1.Q.s 
below 40 who are the result of pathological causes or rare genes 
ae of the usual genetic plus environmental causes (cÉ. 

(ii) Similar discrepancies do occur with group tests which, as 
a (z950) has shown, sometimes giye Ess aie distributions 
$ ‘then’ more cl £y to what is known as the Beta Function 
a es normality, It seems doubtful, however, whether 
such tests Possess sufficient headroom or TEO aO, E 
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accept this dogma, since it provides such a convenient basis for 
test construction and for statistical analysis of test scores. And they 
infer that mental tests which do yield such distributions are 
scored in equal units, whereas the units of tests that fail to show 
normality (as in Fig. 15) are unequal. 

A qualification should be made regarding special abilities and 
attainments, as distinct from intelligence. It may well be that the 
upper reaches of achievement in special fields bring in a different 
principle. Burt (1943) suggests that measures of productivity are 
likely to be strongly positively skewed, i.e. that a very few indi- 
viduals may show really outstanding achievement. This is ob- 
viously true of income, and it seems to apply to artistic, literary 
and scientific output. The musical genius falls far beyond the 
normal distribution of musical ability that applies to the majority 
of the population. Possibly the same phenomenon occurs with the 
talents of school pupils, thoughitis difficult to say how far superior 
is the exceptional musician, actor, artist or model-maker to the 
average until we can devise equal-unit measuring scales. 

One other point to note is that tests are occasionally con- 
structed with the intention of yielding non-normal distributions. 
If they are to be used for selecting, say, the best 20 per cent of a 
group, then they are more efficient if as many items as possible 
are chosen at an appropriately difficult borderline. The distribu- 
tion is then highly skewed and gives excellent discrimination at 
the 8oth percentile, but does not bother about accurate 
crimination between average and poor testees. 


INTELLIGENCE AND OTHER TESTS 


When a child is retested or given a parallel form of a test, his 
second I.Q. may differ to some extent from his first one for any 
ofa number of reasons. So far we have discussed only the first two 
in the following list, and have secn that these can be controlled or 
eliminated by substituting percentiles or standard scores for age 
units and quotients. 

1. Irregularities in scoring units, 
years, 

2. Differences in spread between different tests. 


for this purpose, should be 


THE RELIABILITY OF 


particularly over 10 to 15 


1 Lord (1955) shows that the optimum pass-rate, 
around 35 per cent rather than 20 per cent. 
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3- Chance differences: (a) inadequate sampling; (b) den 
ances at the time of one or both tests; (c) differences in health, 
mood, motivation, etc. 

4. Inaccuracies in norms. : d 

5. Differences in the factor content, i.e. the kind of intelligence 
the tests measure. 

6. Regression effects. 

7. Changes due to practice or coaching. 3 

8. Environmental influences and personality. changes producing 


genuine rises or declin 

3. Chance Factors. P. 
tween influences that 
testees one way, some 


apt to affect all members of a group, even if to varying extents. 

© two types often overlap—for example, motivational All 
fluences may Operate systematically, But the distinction is usefu 
because, as we shall see, the effects of the former conform to a 
Certain statistical pattern, f 

ach item of a test is a sample of the ability we are trying to 
measure, and if there are too few samples or weak ones, the 
results are naturally unreliable, Some group tests cover the lower 


: equately but tail off at the top end. For ex- 
ample, if the test norms run like this: 


Score g 20 30 39 46 50 53 
M.A. Zia Sienomiro T I2 13 


structions are En sep nla ae 
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Testees vary too in their state of health, freshness or fatigue, and 
in keenness or co-operativeness, anxiety or undue excitability. 
While the tester should do his best to control these and to induce 
an attitude of confidence, concentration and carefulness, their 
effects should not be exaggerated (cf. p. 74). At the same time we 
have admitted that very young, or maladjusted, children must be 
handled individually. 

Now if the differences between scores at a first and second test, 
which have arisen from such non-systematic factors, are tabu- 
lated, it is found that they too tend to fall into a normal distribu- 
tion. A majority of children score round about the same, or rise 
or fall slightly, but a few show much larger discrepancies. This is 
the statistical pattern already referred to. Moreover, we can deter- 
mine the spread. of this distribution—now called the Standard 
Error or S.E. rather than the S.D.—provided we know the cor- 
relation coefficient between the two testings, and thus can predict 
how frequently large discrepancies are likely to occur.? With a 
correlation, or reliability coefficient of o-91 (for a test with S.D. 
of IQ. 15), the S.E. is 4Ẹ points. This means that two-thirds of the 

iscrepancies or errors will lie between +43 and —43 I.Q. points; 
and the median or typical child’s error will be only 3 points either 
way.? But as the total range of any normal distribution is about 6 
times its S.D., there will be occasional discrepancies of 14 points 
or over either way. These are likely to occur only about once in 
a thousand cases. Similarly, discrepancies of2 X S.E. or +9 points 


1 More accurately, the S.E. defines the extent of discrepan 
Soe: scores and hypothetical ‘true’ ies between any pair 0 
were perfectly reliable. The discrepancies betwee: š i 
ms T ae e A e nearly half as large again as those « pasi and if 
dren are frequently retested, their greatest twice as 


tge, ranging up to about 427 points. y 
© This $ Bae shetty Beat pails of the S.E., is referred to as the 


Probable Error or P.E. 
LT.—8 
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whose scores it was calculated. If the group was very peros 
gencous, comprising children of a wide age-range, it will be 
boosted; but if the group was unusually homogeneous, say e 
year-olds in a grammar school, it will be far lower. The usua 
practice is to calculate reliability coefficients in a single primary 
grade or in a complete age-group. Fortunately the S.E. is not 
affected in this way by heterogeneity. ; 

In a single age-group, a reliability coefficient of o-91 is re- 
garded as reasonably high. Occasionally coefficients as large as 
0°96 are claimed. The S.E. then drops to 3 points. Even here errors 
of up to +10 points are possible, though very rare. On the other 
hand, reliabilities are often lower, especially over longer periods, 
when some of the other influences listed above (factor content, 
environmental effects, etc.) alter markedly. On retesting after 2 
to 5 years or so the reliability coefficient may drop to as low as 0°6; 
and the S.E. rise to 9} points. The typical child still alters only by 
£6 or 7 points, and half the children by this amount or less. But 
the other half show discrepancies up to 25 or 30 points. Results 
obtained from pre-school tests are extremely unreliable (cf. p. 64): 


It is commonly found also that scores are more unstable among 


above average than a dull i in the Terman- 
Merrill test A 56). Rat Se eiietea coiii 


This considerable de 


uld never think of a child’s I.Q. (or ohe od 
cr cent. Rather an I.Q. of, say, 95 shou 

be thought of as a nae region or pelea eae Fora is weeks 
or months to come there are even chances that it falls between 92 
and 98; and the odds are about ro to x that his LQ. lies within 

we 88 to 102. But the possibilities of much larger dis- 
Ctepancies should not be fe gotten. Over several years, say from 5 
to 10 or from TI to 15, the most we can say is that there 1s fait 
Certainty (i.e. 10 to 1) of its lying between 80 and 110. 

However, this changeability should not be exaggerated. If we 
studied school examinations in the same way, we should certainty 
pancies from hypothetical true scores: 


the median variation will be +8 a 
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find them at least as variable, probably more so because of the 
subjective element in marking. Apart from a few exceptional 
cases, children do stay within the same intelligence band or re- 
gion, at least from 5 on; they do not normally shift from below 
average to superior or vice versa. Moreover, as shown below, it 
is still possible to make valuable predictions of educational and 
vocational success several years ahead on the basis of tests. Note 
also that two tests are more reliable than one, and a series of tests 
at intervals better still. There is much to be said for applying an 
individual test to each child shortly after entering the infant school, 
then a group oral or a non-verbal test at 7 years, and group verbal 
tests at 2-yearly intervals thereafter, and putting the results ona 
cumulative record card. Many children will show a fairly steady 
level throughout; seme may present a rather consistent rising or 
falling trend; the majority will be rather jerky, but wild fluctua- 
tions will be comparatively rare. 

4. Inaccuracies in Test Norms. The results on one test are some- 
times higher all round than on another because the norms of the 
one are too lenient or of the other too severe. Differences of 5 LQ. 
Points or more are not uncommon, though the very thoroughly 
Standardised Moray House tests seldom vary more than I point. 

est constructors should not be blamed too harshly for this state 
of affairs. The difficulties of getting really representative Cross- 
sections of the population are seldom realised.' Any one school 
usually draws mainly from a certain socio-economic stratum; 
hence the average I.Q. in a good suburban neighbourhood aay 
well be 15 points higher than in a poor slum or a remote ie 
area, Picking a few good, mediocre and poor schools on the basis 
of personal hunch or convenience is unlikely to yield an soe 
Tepresentation. The theoretical principles of systematic samping 
are well-established (cf. McNemar, 19402), but it is not easy, 
Particularly for the private investigator, to apply them in practice. 


At the secondary school selection stage, tests are often standard 

1 Our recommendation for substituting deviation poro al each age a 
for age units and quotients complicates matters still eae) r it 2a A 
samples at every age should be representative 1n spread as wel E a oe 

us, strictly, we should not miss out the mental defectives at aeh eee 
or the private and public school pupils at the top end. Perhaps it i pe stan z 
tacitly to omit these, taking 15 as the agreed S.D. in mAn si ie 
admitting that the figure should be somewhat higher in really compiete age- 
8toups, 
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performance in most subjects tended to rise between tnd z 
at by 1939 the norms for many of Burt’s tests, pu 


ion. The 
sation precludes any easy solution 


ised 
; but other test users would be better a í 
to confine themselves to recently standardised tests, or to as 
up percentile norms for older tests in their own schools or 
tricts. 


5. Changes in Test Co 
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timing. We have already drawn attention to the variations in 
factor content of the Terman-Merrill scale at different ages (p. 
55). Probably the main reason why pre-school and infant tests 
show so little correlation with later intelligence is because the 
former rely mainly on perceptual and motor performances. Not 
until children begin to express their ability largely through 
language, after about 5 years, can reasonable stability be expected. 
Weare far too readily influenced by mere names: and just because 
tests for 3-year, 6-year, 10-year children and adults are labelled 
intelligence tests, we expect them all to measure the same thing. 
6. Regression Effects. If we select a group of tall parents, averag- 
ing say To inches above the mean for the population, and measure 
the heights of their offspring when they grow up, the average of 
the latter will be found to be only about 5 inches above the mean, 
Conversely, if we take very short parents, their children will also 
tend to be of below average height, but not so much so as their 
parents. Psychometrists call this regression to the mean. It is not 
difficult to see why this must occur. There isa correlation of only 
o's between the heights of parents and offs ring, and this means 
that many of the tallest parents have relatively short offspring, and 
some of the tallest offspring are born to moderately tall parents. 
Thus it is equally true that if we select tall (or short) offspring, the 
mean heights of their parents likewise regress to nearer me mean. 
Similarly, then, when we select a group of superior children on 
the basis of one test, we must expect to find their aE score 
somewhat lower on a second test; conversely, a group © or 
backward ones will always show an apparent rise. But this sah 
simply from the fact that the two tests do not correlate ee 
and it may have no special psychological significance. It EET 
Tepresent some kind of compensatory weakness pan right 
children, nor strength among dull children—any more tan ee 
gression of heights represents a compensatory See © 
tall or short parents and offspring. Yet time and again i ciro 
crops up in interpreting test results. Here are some exampics. 


1 The amount of regression depends directly on the lovne ne gas 
If the mean score of the selected group is ¥ above the mean of the oe gro! P 
then the mean at the second test will be rx (provided the S.D. remains the By: 
Hence, with r = o's, the average deviation of parents from ae mean i me A 
in their offspring. And in Example (i) below, if r = 0:75, the mean IQ. o 
120 drops to 115. 


118 INTELLIGENCE AND ATTAINMENT TESTS 


i lly have I.Q.s 110 
i) Pupils selected for grammar school usually 
ree and average about 120: retested a year or two later the 
e has apparently fallen to r15. 
Gi) Childeren certified as educationally subnormal, biedy on 
the basis of verbal (Terman-Merrill) test, have I.Q.s aroun ie - 
Since the correlation of verbal with performance tests is on 
about 0-6, their mean performance test LQ. must be apo b 
But there is nothing startling about their practical abiiy ei ae 
on average, superior to their verbal ability. It happens WERE 
some of the lowest scorers on performance tests did not fall in ti 
original E.S.N. group. 
(üi) Children who are 


backward in ordinary school work 
apparently make larger g 


ains when taught by films (or any other 
novel teaching device) than those who are advanced. The Pen 
is merely that the correlation between ordinary school work an 
the test applied after film (or other) instruction is less than 1*0. 
(iv) A group of backward readers 
class, and after a term o 


equally backward group © 


pupils who are not coached, is this inference justified (cf. Curr and 


Gourlay, 1953). 
7 and 8. Adequate discussion 
effects requi 


r more so, that differences in norms, 


: e 
S.D., or in factor content, or chanc 
are responsible. 


in units of measurement, in 


influences (unreliability) 


COMPARING DIFFERENT ABILITIES 
So far we have been concerned mainly with hazards in the inter- 
pretation of scores o; 


n a single test or parallel tests. Unreliability 


INTERPRETATION OF TESTS RESULTS IIQ 
; 


is going to arise, too, in comparin different tes 
fever Educational with Intelligence E 
porma test results. Inaccuracies in the norms of either test, 
ifferences in spread, or different degrees of practice can obs 
Moi stultify one’s conclusions. But apart from these, the 
n ability of a difference between two tests depends not only on 
eps ates of the separate tests, but also on the smallness of 
or ane between them. If, as often happens, they overlap 
at er highly, the unreliability of any differences is enhanced; in 
aa words, only the biggest score differences have any sig- 
nificance at all. Table IV illustrates the reliability of score differ- 
ences at certain levels of test reliability (raa and rss) and inter- 


correlation (rap). 
Tas IV 


RELIABILITY COEFFICIENTS OF DIFFERENCE SCORES 


| - 4 d q 
Five | GE A D EA 
eal ogall aon tak ta «00 
Fae | seta tu seer eM pl 
i5i/ (eed dt 29 00 
CAPENA TAT AES 


T (a-B) (A-B) 

cale, it is very tempting to inter- 
fferent types of items as show- 
actical ability, immediate 
discourages this practice. 


In giving the Terman-Merrill s 
aa relative success or failure on di 
oe good or poor verbal ability, pr 
B emory, etc., though McNemar (1942) 
eu the reliability of any one such item-type might be as low as 
we and its correlation with another item-type 0'5 Or over; in 

ch case discrepances between scores on the two types wo 


have a reliability of less than 0*4- This would mean that a child 
ight well score at 


wi 
ho scored at, say, 8-year level on one type n: 
tions of score 


useful graph showing the propor 
to be Pe to chance at various levels of 


Table IV is based on the formula: 
Zra t Typ) — "as 


T (xB) (A-B) — Trens 


1 
a eee (1949) provides a 
3 gence which are too large 

ity and inter-correlation. 
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6- or 10-year level on the other type by pure chance. The odds 
are only about 10 to 1 that he is genuinely superior or inferior. 
Discrepances between Verbal and Performance LQ.s on the 
Wechsler scales are somewhat more trustworthy, their reliability 
approaching 0-7. But great caution is needed in interpreting scores 
on particular sub-tests which appear to be higher or lower than 
the testee’s general level. \ 
There has been much controversy also over I.Q.-E.Q. differ- 
ences, which have traditionally been considered by British edu- 
cational psychologists to show Over- or under-achievement in 


school work. Often the ratio: ie X 100 or ae X 100 is 


x i i but his 
LQ. is 140, then his yes If his average E.Q. is 110 bu 


to be normal, i.e. there would be much the same numbers whose 


? 5 Sio 
child must obtain the same E.A, as M.A. on well-standardise 
tests, And it is just as easy 

in Pee on Aout of specialised ability, or through good school- 
mg, tavourable environment i i it is to ac- 
count for others falling hele Pe ee i 


j ard 
Tegarding reliability. If the reliabili one we have put forw 


and educational tests average 0-99 nae aes 
"90 and th ation 
0°75, then the reliability PS eir inter-corre! 
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Supposing we selected the 10 most retarded children in a group of 
100, on the basis of one set of intelligence and educational tests, 
and then gave a second parallel set of tests, only 4 of the original 
10 would again be diagnosed as highly retarded; the remaining 6 
would be replaced by others who had previously been found to be 
only moderately retarded. In other words, the majority of low 
A.Q.s probably represent nothing more than chance differences. 
A further objection arises on account of regression to the mean. 
Since the correlation between E.Q. and I.Q. falls below r'o, the 
child of below average intelligence inevitably tends to do rather 
better at educational tests, and the above average one worse. In 
other words, it is very difficult for the high T.Q. child to work ‘up 


to capacity’ or for the low I.Q. one to work ‘below capacity’. 
Again taking the ro per cent of a group with lowest A.Q.s, it 
ith I.Q.s of 115 and over 


will be found that about twice as many wi s 
are included as of those with I.Q.s 85 or under. And if the cor- 
relation between I.Q. and E.Q. drops from 0°75 to 0°60 (as when 
a non-verbal group intelligence and a reading test are compared), 
the high I.Q. children diagnosed as retarded may outnumber the 
low I.Q. ones by 4 to 1.4 Now, on general psychological grounds, 
we would expect to find ‘under-functioning’ more frequently 
among low I.Q. children, who more frequently come from homes 
unfavourable to education. Thus if we must contrast E.A. with 
M.A., we should at least make due allowance for regression (cf. 


Cureton, 1937). 

Apart Fee ei these statistical difficulties, we have already ce 
in Chapter II that it is fallacious to contrast intelligence as wholly 
inborn with attainment as wholly acquired. Indeed, psychologists 
in America banished the whole conception of the ene 
Quotient many years ago; and those in Britain would do well to 
follow them, We will take up again in the last chapter the question 
of the relevance of intelligence test performance to education. 


ESTS FOR PREDICTION 


i i icti decisions 
The object of appl tests is to make predictions or ; 
about R eha so backward in arithmetic that he will 


i i the sub- 

* Such striking diffe have not been noted in the literature on th 
Ject, probably ice ENRE compared have seldom had aliy carai 
norms or standard deviations. A eel survey of the fara of „an x 7 
nesses in, the concept of the Achievement Quotient is provided by Crane \1959). 


USING T 


TS 
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i is 
make no progress in school without remedial Bae ee eee 
sufficiently able to benefit from a grammar school educa feuding 
be suitable for a Civil Service job, that the distribution Siam 
ability in a given area calls for special measures to reduce pee 
and so on. A host of fresh problems arise at this point, an 

ch on them only briefly. " : 
pa The validity i es for the particular purpose is obia 
crucial, and we have already seen that this may be di ae ar 
establish (p. 46). Even in secondary school selection, for Soe ae 
where it is generally quite straightforward to follow up popsi pi 
mitted to grammar schools, it is doubtful whether examinati PT 
other marks obtained after one or more years in the school pro ae 
an adequate criterion of ‘benefiting from’ this type of Binge ne 
(c£. Vernon, 1957). In the vocational field it is often fond’ a iE 
worker’s satisfactoriness to his employers is almost unrelated 
is own satisfaction with the career. Which of these criteria, then, 
should we try to predict? Sas 
2. It is seldom that one would expect a single test to pro the 
all the information needed (except perhaps in establishing 


degree or type of backwardness in the elementary stages ofa shoga 
subject). Predictions are usually improved by including fur! 
tests of relevant qualities 


ul 
» though often less than we wo 
anticipate. It is little use, for e 


it 
» Since they cover so much the same ground. And 1 
is very doubtful olicy addin: 


subjective jud 
tests. The cli 


p ion an 
7 ; data obtained by observation a 
Interview, and that in th; 


i 
5 EE Ey 
i f S way he can predict the individual’s 
Fcuons to various vocational 


inical judgments (cf. Vernon, ae 
The real advantage of the clinical, as against the statistical, P7 
itis far more flexible; it can take account of a wi 


may be relevant only in individual cases, 
and suggest a much wider 


i jsions 
Variety of treatments or other decis 
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to suit such individuals than can a more objective selection pro- 
cedure. The disadvantage is that the clinical psychologist can only 
use his ‘intuition’ and ‘experience’ in deciding how much weight 
to attach to each of the relevant factors; whereas the statistical 
psychologist, in predicting some clear-cut outcome such as edu- 
cational or vocational success, can work out from his follow-up 
investigations just what weight should be given to each of his 
tests in order to produce maximum accuracy of prediction. 

3. In any case, predictions should be regarded as actuarial, or 
possessing only a certain degree of probability. For example, we 
shall see in Chapter Ten that there are strong odds that a child 
with a high I.Q. will do well scholastically and vocationally, but 
we can never be certain whether any particular individual will 
not be an exception to this general trend. A large number ot 
influences not covered by the tests (nor by any other sources of 
information, such as the clinical interview) affect later success. 
Nevertheless, it is noteworthy that errors in prediction themselves 
tend to be distributed normally, i.e. there are a lot of small ones 
and fewer large discrepancies. Their extent depends, as we might 
expect, on the lowness of the correlation between the predicting 
test or tests and the criterion. 

Within a group of pupils selected for grammar school, for 
example, the correlation between combined selection tests and 
later school grades is normally around 04 to o's. Suppose these 
school grades to be expressed in terms of marks ranging from 
about 90 down to 30 per cent. Then we are likely to find that the 
top third according to the selection tests will average about 10 
Per cent higher than those who were in the lowest third on ad- 
Mission, But there will be so much variation that occasional good 
selectees may drop to a mark of 40 per cent, and occasional weak 
Ones range up to 8o per cent. Just about one-quarter of the latter 
will get better marks than a quarter of the former. 


4. Although tests such as those used in selection at 11-plus my 
not seem to be very efficient in predicting the exact degree o: 
alue may be much more 


success within the selected group, their v 
apparent if we ask how well they separate the best 20 per cent of 
€ age-group (or any other proportion) from the bottom 80 pe 
cent. To estimate this, we can ‘correct’ the original correlations ‘Or 
Omogeneity’, and are likely to find that the correlation of 04 to 
O°5 rises to 0-8 or over. This represents the agreement between the 
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tests and subsequent performance, had Ni been able to follow up 
the whole age-group (cf. Vernon, 1957). h 

The implication of such a correlation is that some e a 
those passing the selection tests do well later on, thoug i phe 
third are less successful than an equal number of non-selec 3 
would have been. As shown in Table V, 13 per cent are correctly 


selected and 73 per cent correctly rejected; but 14 per cent in a 
have been wrongly diagnosed. 


TABLE V 
ILLUSTRATING THE PROPORTIONS OF PUPILS 
CORRECTLY SELECTED AT 11-PLUS 
Se a 


Good per- Poor per- 


formance formance 

later later 
CS A A re 
Selected by tests , 13 7 20 
Rejected by tests . 7 73 80 
aN AE RAAE S A 

20 80 100 
Se lS EE A 


Yates and Pidgeon 
efficiency of selectio 
school teachers’ esti 


5. The proportion of erroneous is the 
on the level of correlation but on the Selection Ratio, that 4s 


„9e many more erroneously chosen unless the validity aa 
efficient is very high. On the other hand the costs of testing all 
time, materials, etc., may become excessive when only a s™ 
number of individuals is to be chosen, 
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The complexity of conditions affecting the practical use of tests 
has been well brought out in a book by Cronbach and Gleser 
(1957). Sometimes a highly valid instrument like the intelligence 
test is less appropriate than a broader technique like the interview, 
or play-group observation, which rovides leads or suggestions 
that can be followed up in individual cases. Again, the cumulative 
procedures of the classroom or the instructional group, where 
individuals are being diagnosed as their education or training pro- 


ceeds, and their treatment accordingly adapted, may be more 
d for all into superior and 


to providing for- 
mulae for working out the most economic and efficient testing 


hould be made of the difficult 


oftwo or more 


like differential reliability (p. 119), iMP 
tween the tests. Unfortunately, as we ha 
abilities are rather highly inter-correlated. There can be no doubt 
that more useful tests for educational, vocational and clinical 
Purposes could be devised if this need for low correlations with 
Other tests and other criteria were realised. 


Chapter Eight 


CE 
THE EFFECTS OF COACHING AND PRACTI 
ON INTELLIGENCE TESTS 


- an 
Tuis topic has assumed considerable importance since Enna i 
to be applied on a large scale for selection, that is, in i a 
situations where there are strong incentives to do well. 
shall discuss it at greater length tha 


: - m 
As early as 1920 it was found, from experiments with the Army 
Alpha tests, that Scores could be 


cores s : Hel test; 
to some 5 LQ. points simply by taking one previous Peet ing 
and it was soon shown that more intensive practice or 

would lea, 


as 
d to larger gains. But this hardly mattered so long 
tests were chiefly employed in 
diagnostic Purposes. In child g 
example, the testee (or his par 
the test’ or to obtai i 


: re- 

o neglect possible practice or Cosette ail 

Over, individual Binet tests were the main instrument for indi nise 

lagnosis, and here the experienced tester can usually eed 
rom the slickness of the testce’s respo 


Ao tats ime it iş dificult t° 
OF can cross-question him. (At the same time it is difficult t 
gauge how much allo 


arity iS 
dis d.) wance to make when such familiarity i 
Covere Aen le 
But it is very different in Stoup testing when the child’s oe 
educational and ubsequent vocational career, or the adult’s jo y 
may depend on the result, and when the tester cannot ‘raed 
of coaching (apart from a rise in the Be 

score for the whole group). Candidates, together with 
8, look on i i 


most 
ones being Prepared every year. ea me 
schools can Bet access to older, more or less parallel versions, 

1A fairly complete b 


-he 
ibliography of published research may be found in 
author’s article 


(1954). 
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several publishers actually issue books of questions similar to 
those found in the standard tests. Many primary schools coach 
their pupils on these, even having Intelligence Tests as a subject 
on the time-table. Education Authorities generally prohibit or 
discourage this practice, but it is very difficult for teachers to hold 
out against parental pressure or against the temptation to secure 
as many passes as possible for their own school pupils. A large 
number of parents buy the commercial books for coaching at 
home, and some pay considerable fees to unscrupulous agencies 
that offer private coaching and imply that they can get any child 
‘through the r1-plus’ for so many guineas. 

Now that intelligence and other objective tests are generally 
included in the selection of Civil Servants and of officers and 
apprentices in the Defence Services, it is likely that coaching goes 
on for these as well, though we have no evidence so far that it is 
widespread. Doubtless the same thing would occur if tests began 
to be used for university entrance. But it is most serious among 
10-11 year olds because of the atmosphere of strain that ac- 
companies it and the inevitable distortions of good junior school 
education through over-concentration on training for the tests 
(cf. Vernon, 1957). Be h 

The layman is apt to blame the 11-plus examination as ye 5 
and particularly the intelligence test which he docs not un fy 
stand and therefore suspects, for this state of affairs. But clear y 
the fault lies with the system which demands a rigid separation of 
the ‘wheat’ from the ‘chaff’ at so early an age, and which provides 
for less than 10 per cent of ‘wheat’ in some areas, over 30 per cent 
in others; also with those parents who desire their children to iN 
the cachet of attending a grammar school but who do little to 


encourage or help them once they are in the school, or who refuse 
ited to advanced academic 


to recognise that they may be unsut 

Stic Just as eit lie: and coaching would ie lace 

whatever the form of the examination itself, and we sha he 
elow that intelligence tests are actually less coachable a i her 

More conventional types of examination. It is quite apse ra 
Ore, just because psychologists have shown that coac A oa 

Produce rises in I.Q.s, to conclude that intelligence tests sho 


forthwith be discarded. hi 
Let us first distinguish various degrees or types s me 
A Teste taught the actual answers beforehand. 
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B. They have taken the same test one or more anes eae 
but have not been told the right answers nor received o pore 

C. Testees have taken similar tests before, without 
instruction. 

D. They have not taken a parallel test, but have been coche 
on more or less similar items—that is, told the right answers, . a 
the underlying principles of the items explained, and aoe le 
general hints on working carefully, not wasting time on 
items, etc. ¥ 

E. A combination of C + D, that is of practice and coaching. 

Type A need not detain us: it does sometimes occur and, 
naturally, entirely upsets any test results. 


B. PRACTICE ON THE IDENTICAL TEST 


This is important, since children are often referred to an eau 
cational psychologist who does not know that they have be 


tested recently, or, if he does know, still wants to apply tests ta 
lack alternative versions, such as WISC, Stanford-Binet 
Collins-Drever 


performance tests, Repetition of verbal eS, 
within a few months generally produces a rise of about 5 LQ- 
points. Performance 


ints, Pe test batteries show just about twice a 
pin Similarly, during World War T, Army and Navy psych” 
ogists had no parallels to their standard group tests, and ae 
y wed rises averaging 2-1 to 8-6 points e- 
different tests According to Heim’s researches (1949-50), ae 
ts at weekly intervals produces fur 


j ice 
1. The amount of improvement is limited; increased pract 
produces diminishing returns. eat 
2. Although we have to quote average rises, there are g" me 
variations between individuals. With an average of 5 points, 5O 
testees gain up to 20o 


: eas 
, more points and others actually los 
much as ro Points, 


: t 
3. There are differences in improvability between differe" 
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tests. The research literature on this point is confusing, since gains 
in raw scores are usually reported, and these are quite non-com- 
parable unless the tests have the same Standard Deviation. Here 
xe have tried to reduce them to the same scale as that of Stanford- 
ark and Moray House I.Q.s with a S.D. of 15. It would appear 

at the more complex and unfamiliar the test material and in- 
structions, the greater the possibilities of improvement. The most 
straightforward tests like Vocabulary, Comprehension, Informa- 
oa and Creative Opposites generally yield the smallest rises; com- 
p cated Analogies, Abstraction and Classification tests, particu- 

rly those employing unfamiliar spatial or non-verbal material, 
are much more susceptible to improvement. 

4. These practice effects are remarkably lasting. The evidence 
suggests that three-quarters of the gain found after one week is 
ee up to six months, and half of it still remains after one 

r. 


C. PRACTICE ON PARALLEL TESTS 


Though the effectiveness of practice at other similar tests is 
smaller than that of practice at the identical test, it is still con- 
siderable, When Form M of Terman-Merrill is given shortly after 
a L, the average gain is 24 points only. But in a ty ical group 

telligence test the average gain from taking a single previous 
Parallel version averages nearly 5 1.Q. points. Further practice 


t : ; 
ests produce smaller gains, totalling some 10 points after four or 
or the scores may 


Ive tests. Thereafter the results are irregular, 
ane perhaps because the testees get bored. If children are 
cady fairly sophisticated testees, the gains may be reduced to 


about half these amounts. Thus in a large-sca 
found the mean LQ. 


inlei children, Watts and others (1952) 
an eighth parallel test only 6 points higher than on the first. 
€veral other factors besides previous familiarity may produce 


Tather larger or smaller gains than those just quoted. 
| T. Age seems to make little difference. ‘Adult recruits show very 
Port ar rises to 10-year children. Sex differences are sometimes Te- 
2 ge bt do not consistently favour boys or git ee 
evalu e effects of the initial ability of the testees ate cult to 
Wo ate, since the natural consequence of regression (p. 117) 
d be that below-average children would improve, and 


abo 
Ve-average ones decline on the second test. However, by 
LT. 
—9 


le experiment Wi 


S1 
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+ iterion, or aring per- 
using initial + final score as a criterion, oO. by comp: 8 


i i d I.Q. 
centile levels, it is generally found that bright R] (1951) 
120) gain about twice as much as dull ones (LQ. 80 Baty aa 
claims that among still brighter ones (LQ. 130 to ue epee 
provement is less great. This might be attributed ie EE 
room in the test, but Peel considers that such c E Pie 
easily on the first occasion that they have little roo 
i ent. ae 
aoe When untimed and speeded versions of the same tests 


mer as 
compared, the rises are 75 to 90 per cent as great on eo ding 
on the latter. Thus practice does not merely help in un 
instructions quickly 


? ing ot 
- It is probable that the testes’ wep % 
appropriate ‘sets’, i.e. methods of tackling the yoron oP Hi 
items, together with the reduction of anxiety on the one 


carelessness on the other, may be more important. ferent 
4. We have dealt already with the improvability of diffe 
inds of tests. Co 


he 

mparisons have also been made ena 

omnibus and the battery types and no difference found (c i rg or 
What is important, though, is the degree of heteroge 


Ipha 
versity of types of items. Some battery tests such as Army Alp 
and Otis Advanced 


i ude 
give unusually big rises because they a 
such varied sub-tests, whereas Moray House tests are so cons dnot 
* to contain more homogeneous items, However, we rely to 
avoid heterogeneous tests for this reason, since they are er an 
be more valid in Predicting future success than a narrow 
more homogeneous instrument, Jaimed 
ys of group testing, Thorndike (1922) rir test 
that practice effects could be counteracted by making dequate 
instructions clear and comprehensive and by providing adeq js 
sample items or short Practice exercises before the test started. eri- 
has been accepted by all test constructors, but more recent TEES 
ments with 11-year olds showed the conventional exp. aan for 
and practice sheet to be totally ineffective. It was essen 
children to do a complet 


A r 
e. It tions £0 
3 € test under examination conditio 

© major practice effects to be overcome. 


D. THE EFFECTIVENESS OF COACHING ON SIMILAR 
TEST MATERIALS sate 
It is unfortunate that Psychologists themselves have disagt f 
widely on this Point and have thus confused teachers and © 
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general public. Numerous experiments have yielded average gains 
of 15 or even 18 points of LQ. after quite moderate amounts of 
coaching; others reported rises of only 5 or 6 points when fairly 
extensive coaching is given under ordinary school conditions, and 
it has even been stated that practice without instruction is as 
effective as, or more effective than, such coaching. However, a 
large amount of investigation has revealed the main reasons for 
these discrepant conclusions, and we can now generalise as fol- 
lows: coaching which includes practice at taking complete tests (Type 
E) does produce quite large gains, whereas coaching carried out by 
teachers and parents from the items in published books, or on the 
basis of their own ingenuity, ic. coaching without practice is 
singularly ineffective, regardless of how protracted it is. The other 
major factor is the testes’ previous sophistication. Those who 
have had no previous experience of tests show rises about twice as 
large as those who have taken several educational or intelligence 
tests, or been coached at school, in the past. Thus, under present- 
day conditions, where the majority of British children (at least in 
urban areas) are to some extent ‘test-wise’, the total average gain 
from taking two practice tests + 4 few hours of interspersed coaching is 


not 5 or 15 but about 9 points. This is confirmed, not just by small- 
scale experiments but by practical trials including complete 11- 
hs. With similar children, a 


tar age-groups in large boroug c 

ngle practice test gives only 3-4 points, and several practices 5-6 

ite Thus coaching does add something, but remarkably little, 

a obviously far less than teaching does in the case of ordinary 

rel ool examinations. As always, there are 2 number of other 
evant factors to be considered. fe 

ae 1. Big individual differences in “coach-ability are found. Thus 
Me 14 per cent of children gain from 15 up t° 25 or more points, 

le some 5 per cent show no gain at all, or even lose despite a 


a Coaching and practice can achieve. In most Education Areas 
(e.g. from private or 


aan there will always be some children (€-g- Sea 

tests Témote rural schools) who have no previous epay 3 

serio tal all, and they are likely to be handicapped mu mo 
Tously—by an average of 12 or more points—in comparison 


with fully coached children. 


tailed stud: f pupils w. 
ly of the progress @ ee was no appreciable 
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average improvement beyond the fifth test, some cy 
reached their maximum earlier, some later. Some then remain 
steady, others fluctuated considerably. ; 

Ds The maximum effects of coaching are achieved very ame 
and teachers, parents or others who carry out larger anoles 
more likely to produce falls than further rises, Watts reports Ae 
experiments comparing groups coached for 3, 6 and 9 apa Be 
the end the 6-hour groups were distinctly poorer than the 3-hours 
the 9-hour recovered slightly but did not surpass the 3-hour. hl 

3. The effects of coaching and practice tend to be hig ee 
specific, i.e. there is relatively little transfer to other types of t d 
items or to ‘different testing conditions. Thus we have alrea y 
seen that coaching without practice under examination cone ee 
yields very poor gains. In one experiment, no effect on Ver n 
Analogies and Classification could be discovered from coaching © 
Non-verbal Analogies and Classification, nor vice versa. Indee Be 
there is evidence that experience at one type of test sometim 
decreases ability at other dissimilar types. When a battery © 
several diverse te: 


| sts is applied, it is found that the average a 
vary slightly (1 or 2 per cent only) according to the order in w. 
the tests are done, This must me: 


C r 
: an that the ‘set’ or method of wo 
appropriate to any one test h 


y 
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Council’s surveys of intelligence at 11 years, children in 1947 who 
lived in urban areas where they were likely to become familiar 
with tests showed a rise of 3:2 points over 1932, whereas those in 
more remote areas, where tests were still little used, showed a rise 
of only o-4 points. Again, the Moray House organisation has 
found the British norms for its intelligence tests to rise from 1945 
to 1955, till now they seem to have become stabilised at a level 6 
points above pre-war (Pilliner, 1959). In other words, the present- 
day test-sophisticated child with an LQ. of 100 on current tests 
scores I.Q. 106 on parallel pre-war tests. Of course, this may 
represent a genuine rise due to improved education, health and 
social conditions, but an explanation in terms of test-familiarity 


seems more plausible. 

4. It is extremely likely 
coaches than others, thoug 
cause of the large random variatio: 
child, or one class of children, as compare: 
not specify, either, what constitute successful or unsuccessful 
coaching methods; probably they are much the same as good and 
poor teaching methods in general. 

5. The effects of coaching seem to fade more rapidly than those 
of practice, though there is a paucity of evidence. In one experi- 
Ment a group tested 2 months after coaching showed only two- 
thirds the gain of a group tested 14 weeks after. The following re- 
sults, adapted from Greene's study (1928) of the Stanford-Binet, 
are interesting. 


that some teachers are more effective 
h it has been difficult to prove this be- 
ns that occur in the gains of one 
d with another. We can- 


Taste VI 


ACHING ON STANFORD-BINET 1.Q.s 


EFFECTS OF CO 


Mean Gain when Retested after: 
3 wks. 3mths. 1yr 3 yrs. 


PDS ie) ADIN Th Pie, oe oe 
Children coached on test itself Pee oy Sr 
on hec Ea: : 7.9 76 5:6 15 

ed on similar test material S0 26 33 06 


Several small groups of children were coached either on the 
actual test items or on similar ones, and retested after sabe 
Intervals, Controls were not coached, but nevertheless showe 


STS 
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some average rise as-a result of taking the test : eis effect (in ne 
the coaching on similar material and the pra car, though the 
control group) are still quite strong after pak cae om- 
coaching on the test itself has faded. they o rogressed to quite 
pletely until, after three years, the children have P A account 
a different part of the scale. This suggests man investigations © 
has been taken of practice effects in some of t ne Nene) 
environmental influences on intelligence (cf. C iE RARA The 

6. Age and sex of the testees make no consisten! N 
influence of initial ability is also uncertain, but ae Pea 
coaching the moderately bright still tend to ga and about the 
means, incidentally, that improvability is greatest ro Se itn 
borderline level for selection. Those with LQ.s eaaa d above: 
an average of 11-12 points instead of the 9 points Tmprovement 

7. The types of test item most susceptible ee 
through practice also seem to be the most coachable. 


DISCUSSION 
It is sometimes claimed ion stage 
A 7 ection 5! 
practice or coaching matters less. at the qantas se combine 
ecause intelligence carries only part of the weight. 

with English and arithmeti 

i ic tests 
represent only 3 points i ithmetic tes 
However, this is allacious since the English and arith 
probably offer at least 


a is to say» 
as much scope for coaching. BEN as 

children can gain not only in their knowledge of these sub 

such, but also in their facili i 


JX 
f ‘Q. + Ar.Q. If the borderline were fi 
ect 20 per cent of pupils 
ME 30 per ce 
© unsophisticated pupils 
Writer, testing in a county 
` tural schools markedly inf ial o 
and speed of mechanical arithmetic—that is, in materia though 
type likely to be practised by many larger urban schools— that is» 
the equal of the town children in vocabulary and spelling, k. 
in some of the basic skills needed for grammar school work- 
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While it is clear, then, that coaching -+ practice effects are large 
enough to make a considerable difference in selection, it is equally 
true that ‘intelligence’ cannot be taught in at'all the same sense as 
are school subjects. The total increase is limited and there is no 
continued improvement with continued training. We have 
shown, too, that the effects tend to be specific; in other words, 
training in intelligence tests probably does nothing to increase 
intelligence at anything else. Even the general sophistication effect 
merely means familiarity with the kinds of items and methods of 
work appropriate to most psychological tests. In a later chapter 


we shall see that some types of school curriculum and some 


teachers probably do help to stimulate the growth of all-round 


intelligence. This is to be welcomed; and the consequent increase 


that it brings in intelligence test scores is wholly legitimate. But a 


type of instruction which concentrates merely on increasing test 


scores is equally to be deplored. | ’ 
What can be done to counteract it? If coaching and practice 
were wholly specific in their effects, it would be sufficient to de- 


Vise new types of test on each occasion. But we have seen that 
there is quite a considerable general element. Moreover, the 
imited, and it would be 


ingenuity of test constructors is not 

extremely expensive and time-consuming to carry out the neces- 
Sary experimental trials of novel tests; they would seldom be as 
reliable or valid as the already established varieties. Certainly some 
alternation of types of test, and greater use of those types which 
are known to be less coachable, would help to reduce the present 
unfairness, but would do nothing to reduce the pressure on the 


children and the waste of time on coaching at school or at home. 
candidates have been 


OW injustices arise in so far as some 
coached by aedi aek not. When a candidates have been 
coached, the selection borderline has to be raised and the test 
norms altered, but no one is unfairly handicapped. Tt is the exist- 
ence of ‘differential’ coaching of some children and not of others 
at really matters. Hence one solution isto authorise teachers mt 

Primary schools to give coaching and practice. This is all the more 

By including some 


1 . 
Travers (1938) has suggested an attractive alternative. j 
a Ney ses oF items, others less coachable, it would be ossible 
F termine statistically, which children, or W ich schools, had receive coach 
foe Unfortunately, the technique is too complicated and seems too UNT 
T Practical application. 


S 
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: imal 
lausible since so little training is needed to bring Spee 
hat Tec Candidates who had received a ae eters 
coaching would gain no further advantage; they miigh A 
worse than those who had the small amount of pane a all 
ing. It would also have the positive advantage of OE A 
children what the selection examination was going to Eal 5 
thus helping to reduce their anxieties, TREE PERC oF and 
coaching required should be decided in Peliehict o ier 
tions. In areas where grammar school provision is lavish, wine 
coaching is still fairly rare, and where adequate use is aie 
primary teachers’ estimates or of other criteria besides ne mee 
tests, it would probably be sufficient to provide all child el Ta 
a single trial run, plus an hour or so’s instruction base oe iia 
scripts. This trial should include all the selection tests, n ae 
intelligence test only. But in areas where competition is based 
severe and coaching widespread, and where seon Is Bes 
solely on objective tests, it would be desirable to a ees a 
parallel trials, to mark the tests in class and to allow E aie 
give further guidance for a few periods after each trial. 
teachers should also b a 
ane every opportunity should be taken to inform teachers 
parents of the ha 
limited amount, Anal 


One objection that 


7 f son is that, if 
has been raised to this solution is that, 
teachers vary in their e 


ffectiveness as coaches, it would reintroduce 
big differences between the performances of different scho! 
classes. While there is some truth in this, the differences WO es 
still not be so large as those existing at present when some C. ae 
are thoroughly coached and others have no previous acquaintance 

ther practical query concerns the effects has 
fhe reliability and validity of the examination. If, as Watts 
shown, 


p . of 
It is indeed generally true that two or more tests (or sets 
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tests) are more reliable and valid than one. But the principle hardly 
applies if the results of the first test are distorted by some children 
having had a good deal of previous experience, others none. In 
these circumstances the later tests, when all children have had 
some experience, are likely to be more trustworthy. 

Curious as it may seem, the effects of coaching on reliability 
and validity are so small that it is difficult for the psychometrist to 
determine which tests are most predictive. If we take an extreme 
case where half the candidates are coached and show an average 
rise of 22 points of I.Q.-+ En.Q. + Ar.Q., half have had none, the 
Consequence is only a 3 per cent reduction in the correlation co- 
efficients. If the normal correlation with subsequent secondary 
school success is 0°85, the correlation under these abnormal condi- 
tions would still be 0-825. This emphasises the point already made 
that the susceptibility of tests to coaching and practice, and the 
unfairness that this produces in individual cases near the border- 
line, does not imply that tests become useless. f 

Our plan would have the disadvantage of upsetting the norms 
for published group tests. But they are already rendered inapplic- 
able through the existence of widespread illicit coaching. For- 
tunately, it would have little effect on performancein most of the 
individual intelligence and attainment tests which the educational 
Psychologist uses in studying backward or maladjusted children. 
Research investigations and surveys of children over the 10-12- 
Year age-range would also be affected by universal coaching. 
though certainly no worse than they are at present by differenti 
coaching. Actually, the controversies over coaching have been 
beneficial in bringing home to psychologists that nee g 
perience of tests does make a difference and must be allowed for 


in drawing any conclusions. 
b the ting ae however, no entirely satisfactory solution ai 
© expected through modifications of testing technique an wa 
stration. So long as tests are misused for competitive per 
Poses, as distinct from diagnostic and research purposes, 
Problem will continue. If secondary school selection beea a 
Process of allocation, as the 1944 Act envisaged, and i ae 
cation were a continuing process, reflecting omie x $ 
ue abilities throughout their school careers instead o o 
ased on one set of examinations at 11, the incentives Lee 


Would disappear, 


Chapter Nine 
HEREDITY AND ENVIRONMENT 


in dail 
We have seen that the intelligence which can be rare ee 
life, or which is measured by our tests, should not be i ind 
with pure inborn native capacity nor regarded as HEN Suef 
sharply distinguished from knowledge or skills acquire al Ad 
upbringing and education. Nevertheless, there are si N 
divergences of view as to the relative importance of na sale 
nurture, and much controversy over the in erbaa iy RBS 
televant evidence. Psychologists such as Terman, Burt an 
Cattell have consistent! _the intellige T 
measure is mainly determined by the genesand that the influen 
of environment is quite small. Others 
J. B. Watson, the Behavi 
‘ype of man, talented o; 
and many American w A 
tional psychologists and the Chicago sociologists, would appeat 
deny any role to heredity. 


BS ear f $ santists WHO 
The Psychologists just mentioned are all serious scientist: 
strive to maintain im 


en 
partiality of outlook. But only Ea oa 
people’s views on this topic are distorted by political an tion 
prejudices. Those with koine Opinions dislike the BR ne 
that anyone born from an upper- or middle-class family pas ‘lieve 
innate superiority over those of less privileged birth, and ‘atid 
that social reform and improved education will rectify ent 
divergences. Whereas the view often expressed in the Bae 
century, and still occasionally heard, is that the poor ¢ 
benefit from, and d 


i and 
, as good an environment 4 


PIOA $ de. 
ETA f genetic constitution that for a time Men 
principles were rej 


: ing 
ected in Soviet Russia, and intelligence ae 
is still regarded 


uman traits and abilities to pe 


ople of Nordic descent, 4 
undesirable o 


e 
psychology and biology bec#™ 
ia 


) 
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as perverted (in the opposite direction) as Russian is today. Thus 
a highly reputable pre-war German psychologist, E. Jaensch, 
descended to such nonsense as: “The superiority of Nordic races 
is teflected in race differences among chickens. The Nordic chick 
is better behaved and more efficient in feeding than the Mediter- 
ranean chick, and less apt to over-eat by suggestion. These 
differences parallel certain temperamental differences among 
humans. The poultry-yard confutes the liberal-Bolshevik claim 
that race differences are merely cultural differences, because race 
differences among chicks cannot be accounted for by culture.” 
The conclusions reached in this book are middle-of-the-road. 
They allow considerably more scope to the influence of upbring- 
ing than do most previous books by British psychologists. Yet 
they will not please the left-wing theorist, since they recognise 
that there is clear evidence for some hereditary determination. 
Several different lines of enquiry have been very extensively ex- 
plored, and we will proceed to survey their findings in turn. 


PEDIGREE STUDIES 
Sir Francis Galton led the way with his studies of ‘hereditary 
genius’. In his own family and that of the related Huxleys and 
Darwins there was a remarkable galaxy of talent, suggesting that 
ability is passed from one generation to another. On making a 
systematic study of nearly one thousand men of acknowledged 
eminence, he found that a large proportion of their close relatives 
Were also outstanding—far more than in the general population. 
At the opposite end of the scale the most famous pedigrees to be 
worked out are those of the Kallikak family and the Jukes. 
Though these are of little scientific value, a summary of the latter 
may be of some interest. Mr. Jukes, born about 1830 in North 
America, was a hunter and a fisher, whose sons married into a 
degenerate family of sisters. One hundred and eighty years later 
3 pe was collected on oe 2,000 eee Of these, 
78 died in infancy and 301 were illegitimate; 366 were paupers, 
80 habitual eA 171 ER eaat: crimes, including 10 
murders; 175 were prostitutes and $§ with venereal disease had 
infected 600 other persons. In 1915, out of 1,258 living descendants, 
about half the adults were fair or good citizens; nearly as many 


Were ‘antisocial’, and 103 were ‘marked cases of mental defect’. 
tween 1800 and 1915 they had cost the State 2 million dollars. 
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p 4 ing the 

Now, even apart from the obvious difficulties of eee a 
traits and abilities of people long dead, such studies are 1 bea 
no scientific value, since the findings could equally Wen he 
counted for by the cultural intellectual environment in PE 
Galtons and their like were brought up and the depres E 
moral environment of the Jukes. Indeed, before Meee: a 
similar to Galton’s were published by De Candolle in Sei, 
J. McK. Cattell in America claiming just the opposite oe dy 
They showed that eminent men of science came most fr T 3 
from the rich and leisured classes, in countries (or parts ‘enti 
U.S.A.) where there was a good system of educado RE A 
libraries and laboratories, freedom of opinion and a A milies 
religious censorship. One might, of course, answer that sane 
with high talents were more likely to settle in such eave ccs 
and to continue improving the conditions rather than 


y ause 
conditions improved them. But such arguments regarding ¢ 
and effect would be fruitless, 


RING 
CORRELATIONS BETWEEN INTELLIGENCE OF o Sian oy 
AND INTELLIGENCE OR OCCUPATIONAL STATU 
PARENTS 


: rts : able to 
Turning to more modern work, it is obviously prefer a 
compare objective meas 


urements of intelligence E a 
subjective estimates of ability. It is not easy to persuade par Jones 
tested children to take tests themselves, but Conrad an children 
(1940) succeeded in applying the Stanford-Binet to all the le © 
and Army Alpha to the adults in a representative samp 


e 
families in an area in New England. They obtained an pay 
correlation of 0-49, Many other studies yield a correlation at 
o's between siblings (i 


ne 

Lc. ordinary brothers and sisters), a" 2 o 

would expect the genetic resemblance of parent-offspring ones 

two offspring in one family to be the same.! Conrad an ie 

“noted that mother-child correlations were no higher than pect 
child, nor like-sex higher than unlike-sex, as might have co- 

anticipated on environmentalist theories. Moreover, the 
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efficient is much the same as that for height, which is an attribute 
known to be largely dependent on heredity. 

Another approach is to test the relatives of mentally defective 
persons who, even though they are certified on other grounds 
besides low intelligence, do usually fall below I.Q. 70. Both Burt 
(1955a) and Penrose (1938) have found that only about 14 per cent 
of the children of an adult defective are themselves defective or 
obtain I.Q.s below 70. Just about the same proportions obtain 
LQ.s of 100 and over. This regression to the mean is just what 
would be expected from a parent-child correlation of o-s. It is 
equally true that many parents of normal intelligence have 
children who fall within the defective range, and that the majority 
of defective children are born of dull (around I.Q. 8 5) rather than 
of defective parents. However, Penrose, Fraser Roberts (19 52) and 
others have pointed out that the abilities of relatives of low-grade 


defectives differ markedly from those of the relatively high-grade. 
f imbeciles and idiots are not pre- 


family there is likely to be som 
such as gene mutations, prenatal ute 
Case of cretins) glandular defects in the mo 
rain disease. No hard-and-fast distinction can 
types, since the causation of mental deficiency i 
ie. genetic, intrauterine, environmenta d pathological (cf. 
Clarke, 1958). But the existence of the latter, so-called exogenous 
type, in addition to the former endogenous type pa 
account for the peculiar discrepancies between parental and cl 
ability in the case of low-grades. It also explains why there seem 
to be more defectives, especially low-grades, in the total Poses 
tan would be expected if intelligence were strictly normally 
distributed (cf. p. 110). i 
Partly because of the difficulties of testing pare: a 
because of the intrinsic interest of parental occupational status, te 
relation of the latter to child intelligence has been more fully 
Studied. In Thomson’s first survey in Northumberland, the mag 
LQ.s of the children of over 100 occupational groups were © 


parents, and partly 
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tained, of which the following are examples (Duff and Thomson, 
1923): 


Clergymen 121 
Total professional 112 
Commercial and business 105-110 
Industrial workers 100—103 
Labourers and agriculture 96-98 


Hawkers and chimney sweeps 91 


Similarly, Terman and Merrill (1937), in standardising the revised 
Stanford-Binet scale, obtained the figures shown in Table VIL. 


Taste VII 


PARENTAL OCCUPATIONAL DISTRIBUTION AND 
TERMAN-MERRILL I.Q.s OF CHILDREN 


: can T.Q. 
eee Father's Occupation i K Children 
ey zeae 
See ee NTN A 
I Professional , ? a ; . 5 ap 
I Semi-professional and managerial. 8 m 
m Clerical, skilled trades, retail business 26 Y 
Vv Semi-skilled, minor clerical and 5 
business Bhat ee : 31 1o 
VI Slightly skilled - > 7 9 a 
IV and VII | Rural and day labourers WR o 22 F 


Though a different lassifi 
Mental Survey, the hase 


One can generalise that children of the upper professional grouP®. 
Usually score r Standard Deviation AR. say and those of 
the least skilled la i 


hi gain there is regression towards the mean- m 
ghest and lowest groups of fathers would be likely to $° ise 
neh above and x S.D. below the mean. We should also rea en 
that there is tremendous overlapping. A few labourers eio 
range up to 130+ LQ, and a few professional children 4° 
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to i pe eea 4 
vonage ee 7 
very bright childr x , the greatest a solute numbers of 
tee g T en come from occupational groups II and V, 
r zom e working class, clerical and retail fathers. The pro- 
siona and upper business parents produce the biggest relative 
Proportion but a smaller total of the outstanding intelligences of the 
ET penio Hence the necessity (whether we incline either 
; piste tarian or environmental theories) for the educational 
oil ae to promote social mobility by allowing the brightest 
ze to come to the top regardless of their origins. 

2 e correlation of father’s occupational level with child’s 1.Q. 
consistently found to be about 0°35; that is, a little lower than 
on eee anon for parent’s intelligence (cf. Fleming, 1943). But 
ue tests such as Gesell’s, given to o-6-month infants, parental 
can correlates zero or slightly negatively; the higher-class childis, 
ee slightly poorer in early sensory-motor development. 
Be s relationship becomes increasingly positive from 1 to 4 years, 
db e tests become more conceptual and verbal in content. This 
5 es not necessarily show the increasing effects of environment 
etween o and 4; it would be equally plausible to say that the 
genes underlying intelligence do not come into operation until 

e higher brain-centres mature. 

Now, although all the findings so far q 
areas determination, they could equally well be explained 
x preh the better upbringing that high-intelligence or high 
Brees parents can provide than lower ones, their richer 
tae ulary, more favourable attitudes to schooling, etc. Indeed, 
E e Chicago sociologists, Davis, Havighurst and Eells (Eells et al., 
aun take precisely this view. They find that middle-class 
es, en do better than working-class even on spatial and non- 
ie al intelligence test items, though particularly on vocabulary 
ag ms. By ro years the difference between the highest and lowest 
oa mi groups amounts to 4 years of Vocabulary M.A. 
: nd they interpret this to mean that the tests largely measure 
taining in middle-class linguistic culture. They have attempted to 


c A 
onstruct a series of ‘culture-free’ tests, but so far apparent y with- 
ovides a valuable new 


o : A 
We much success.1 While their work pr v f 
t on cultural differences between classes in child-rearing 
ension of comic-strip 


1 

The Davis-Eells Gan i ehi 

i A nes test, based mainly on compr 

Pictures which are applied orally, still tends to show class differences. 


uoted are consistent with 
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practices, which might well affect intellectual development, ht 
ignores the possibility that class differences in intelligence mig 
be to some extent innate1 : II 

The trouble is, of course, that children’s environment is usua 7 
of a piece with their heredity. The more innately intelli 
parents, who pass on superior genes, also tend to provide the $ gr 
environment; hence the difficulty of separating the contribution: š! 
of nature and nurture. One of the few really conclusive investiga 
tions which did manage to do this is that of Lawrence CoA 
who measured the intelligence of illegitimate orphans. Thes 
children had had no contact with the fathers, and had bes 
separated from their mothers before the age of 6 months. Hoya 
ever, the occupations of the fathers were known, and there i 
stit a correlation of about 0°25 with the child’s 1.Q. This prov! = 
definite confirmation of genetic class differences. But equal ys 
so far as it is lower than the figure of about 0-35 for children 

tought up by their own parents, it proves the influence © 
environment, 


Further very strong evidence is provided, not by the ae ode 


tion would be expected to produce quite 4 
Wide range of intelligence in the same ae a: it would be 
much more difficult. to explain this on environmental theory: 
fo do of Course often favour, or pay more attention to, ora 
m d than another, though not necessarily the one who turns oY 

© brightest. Such favouritism could hardly bring about differ- 
aS. 39to 401.Q. points, Thus the fact that professors occasio 
H ave very dull children, or unskilled labourers very big 
șa ponts inevitably to the importance of hereditary factors. 
“ifs est we observe even larger differences between orphan 

- ougn they are reared in highly standardised institutional € 
vironments, The uniformity of their upbringing may hava di 
; “S not seem to make them much mo 
Sah ae in LQ. than children brought up in more vari? 
bri T me nother argument that school teachers pers 

ring forward is that huge differences in mentality persist betwe 

e children in their classes despite all their efforts. However, a” 


1 s s 
ometimes disputed by British sociologists also (F! it 


re 


d 


This possibility į 
1958). ears 
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hereon cul answer here that such differences are firmly 
e e mu ive i ingi 
ae Feet more pervasive influence of upbringing out- 
One further point should be made: though we have admitted 
that the environment affects the child’s intelligence, we should not 
forget that the child’s intelligence often affects the environment. 
Parents who find one of their children showing greater intellectual 
promise than another are likely to provide the former with more 
books and other intellectual stimulation. He is also likely to get 
into higher junior school streams and to win a grammar school 
place and this, as we shall see later, may still further widen the gap 


between him and the less gifted. 


STUDIES OF TWINS 

Identical or monozygotic twins are produced by the division of 
the mother’s ovum after it has been fertilised by the father’s 
sperm. Thus they possess the same genes and the same hereditary 
potentialities. Fraternal or dizygotic twins, like ordinary sib- 
lings, grow from separately fertilised ova. Hence there are even 
chances that they will be of the same or opposite sex, whereas 
identicals are always like-sex. Again, if one of them receives 
parental genes underlying high intelligence there are even chances 
that the other member of a pair will, or will not, do so. Among 
more distant relatives such as two first cousins, or a grandparent 
and grandchild, the hereditary resemblance is halved again, i.e. 
ere is a one-in-four chance of any gene being present in both. 
Now on correlating the intelligence test results of pairs, a 


typical result is: 
Identical twins 0°90 
Fraternal like-sex 0°70 
Fraternal unlike-sex 0°60 
Siblings 0°50 
First cousins 0°27 
Unrelated children 0ʻ00 
The resemblance of identicals is almost as high as the reliability of 
correlation of one child’s 


the tests employed, i.e. as high as the 
-Q. with his bes LQ. ae parallel test. As hereditary re- 


Semblance decreases, so does the correlation, and this fact is 
usually quoted as strong evidence for the inheritance of intelli- 


LT.—10 
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: jally like-sex 
ence. But it will be noted that fraternal eae m Ganon 
ae correlate more highly than siblings, gi e sed e 
es their genetic resemblance is the same. 14 plausibly be 
sin lation between identicals than fraternals cou 3 pan eae 
ited to greater similarity of environment. ie show 
usually dressed alike and treated alike; and fast ope ee SN GAE 
that they seem to have a peculiar bond of Spas A 
another, always doing things together and ang SVEA; 
ests. Fraternals, however, are more often in aN 
school, and quarrel as much as siblings habitua A CO. a different 
A crucial test is provided by identicals broug SP Freeman 
environments. Naturally these are rare, but mens had been 
and Holzinger (1937) traced 19 pairs in nerie N iin 
separated at an early age, and Burt (1955) has tes ENEO). 
ewman’s research the median difference in Stanfo 


F ht up 
u 
was 7} instead of the 5 points normally found for pairs broug 
together, and a few pi 


points. Moreover, the 
the two homes, on th, 

€ authors conclude 
are about as large as 


: . Perhaps 
by other Psychometrists who have analysed their data 


hed by 
and sibling data is that publishe i 


milies; 
nvironmental differences beter oe in the 
3 Per cent to differences in upbringing between c 

same family; 


butions, 1- 
18 per cent to joint hereditary-environmental one eget £ 
to the fact that families with good heredity usually a 
superior environments, 
is conclusion is Teas 
other writers, such 
it brings out th, 


d by 
onably consonant with that reache 


: jca, an 
as Burt in England and Burks in Ameri ron 
€ point th 


on- 


hat 
pate mew 

cr, Burt’s method of analysis yields 2 o correla 
higher Weight to heredity. He bases it on the followin: e, sibling‘ 
1A partial explanation is that fraternals are tested at the ea forme! 
at least a year apart; thus the tests used are more closely similar 


HEREDITY AND ENVIRONMENT 147 

tions for his own groups of twins (the corresponding figures from 
> x ; 

Newman'’s research are added in brackets). These show that the 


resemblance in intelligence between identicals reared apart is far 
higher than the resemblance between unrelated children brought 


Taste VIII 


BURT’S AND NEWMAN’S CORRELATIONS BETWEEN 
PAIRS OF TWINS, SIBLINGS AND UNRELATED CHILDREN 


Type of. Relationship Height ees esta 
Identicals reared together . -957 (-981) | -921 (-910) -898 (-955) 
Identicals reared apart $ -951 (-969) | 843 (-670) +681 (-507) 
Fraternals reared together . -472 (:930) | -526 (-640) -831 (-883) 
Siblings reared together -503 “491 “814 
Unrelated children in same 

home i y, . | —-069 +252 -535 


Up together; whereas in the case of school attainments the cor- 
relation of 0-535 attributable to the same environment with 
different heredity is much more nearly equal to that of 0-681 for the 
same heredity with different environment. From these figures Burt 
claims that the contribution of heredity to individual differences 
in intelligence amounts to 83 per cent, in height 92 per cent and 
attainments 40 per cent. But correlations for small numbers are, 
of course, unstable; and, applying the same method of analysis to 
Newman’s data yields: intelligence 75 per cent, height 73 per 
Cent, attainments 62 per cent. This would support our claim, in 
Chapter Two, that there is little essential difference between 


Intelligence and attainments. 


CONSTANCY OF THE I.Q. AND OF ATTAINMENTS 

If it were true that intelligence test results depended almost 
wholly on innate ability, the I.Q. would be expected to remain 
Constant throughout life. We have already seen that the ere ot 
Constancy is fairly high over short periods, much less so oe ong 
ones, though also that much of this variability may be due £ 
Imperfections in the tests and only part of it to alterations in the 
testees, 
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m to 
. oes not see: 
Interestingly enough, the degree of Ee EN IA tend tp 
7 i nts. Fo $ ir schoo 
erent for school attainme: p their schoc 
be E as rising or falling consid cab ye ona in 
think i ae a 
mance in successive classes, this is Pen re, too, muc 
petor d success; and here, re 
articular subjects than of all-roun eea hol examin 
o: the variation arises from the unreliability E primary 
tions. In one investigation of some 1,200 PE her correlation: 
schools! Blandford (1958) found somewhat i 5 a between, ie 
between Attainment Quotients over 24 year reater alteration: 
telligence Quotients.? It is likely, of course, tarer e schools an 
in attainment may occur when children jects hence the 
advance to new or more complex school su RETE, predict- 
value of intelligence tests at the end of nas Sante a 
ing secondary school performance. me dat aieainiments 
that intelligence remains entirely stable o E 
wholly at the mercy of environment. elene aveb 
We will now ask how far changes in nei en agaa 
shown to result from particular environmental 


s 
EFFECTS OF UNUSUAL Ta Oe and gipsy 
In 1923 Gordon published a study of canal boa 


t 
A Binet tes 
children who had very little if any schooling. rhe se of 6, but 
they were of nearly average intelligence up to 
ereafter their M.A, 


Jac 
eas 

s failed to progress at the oim oppe is 
of schooling made itself felt, and their LQ.s a 5 
On performance tests there was no such marked dif 5 affected A! 
was taken to mean, not so much that intelligence i legitimate y 
schooling, as that a verbal test such as Binet canos nal opp 
be used unless children have had normal edutas com 
tunities. Similar findings have been reported from a cil in Brat 
munities in the United States. A recent study by Matrices) may 
showed that non-verbal ability (on a test similar to 277,000 Po 
be affected as much as verbal ability. He tested some 


oling 
choot 
years, many of whom had had no s$ 
whatever. The scores in illi 


normal, but thereafter th 


por 


bility of "ho 
: to 
* This unexpected finding may be due partly to the gaaei hours 
attainment tests than the intelligence test; the former too 

give, the latter only half an hour. 
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(1936) on slum clearance. Over two hundred s-8-year children 
were tested in a poor and overcrowded part of Glasgow, shortly 
before they moved to a new housing estate. On retesting after 1 
to 14 years in the new environment they showed a small, but 
statistically significant, increase of 14 LQ. points, whereas a 
control group that stayed behind showed no significant improve- 
ment. It is perhaps surprising that the newer schools and the more 
healthy and stimulating environment could have even as much 
effect as this in a relatively short period. 

Some of the most extensive enquiries have dealt with foster 
children, F. N. Freeman tested brothers and sisters (who could be 
regarded as potentially of the same average intelligence), some of 
whom were placed in good and intellectually stimulating homes, 
others in poorer homes. The estimated effect of the difference 
attributable to such different environments was 10 to 20 points of 
1.Q. This may not sound very large, but the correlation between 
siblings in different homes fell from the usual figure of 0-50 to 
0°25; moreover, a higher correlation, namely 0°37, was obtained 
for unrelated foster children in the same home. These findings 
were criticised on the grounds that the more intelligent and 
better-class foster parents would tend to take more trouble in 
selecting bright children for adoption. Other similar studies by 
Burks and Leahy confirmed the existence of selective placement, 


and provided additional evidence that the environmental influence 
of the foster-home on the child’s I.Q. is limited.* 

From 1938 on, a series of studies were issued by Wella 
Skeels and Skodak at Iowa University, purporting to show far 


greater effects of altered environment on the LQ. For example, it 
1.Q. at 3 years was 90, 


was claimed that 26 children, whose mean wee 
Were sent to an orphanage where there was an extreme p g 
intellectual stimulus, no play materials, and an ekee nur: > 
in charge. When retested 2 years later the mean 1.Q. be Tye 3 
to 74, and many of the children fell below the Par ine oe 
mental deficiency. On the other hand, a group ° ha d 
children of subnormal mothers (said to average LQ. a wo. an 
been placed in good foster homes, were found to obtain a mi 

1 The work on foster children, to ether with to ona AEAEE 
relevant investigations, is AA in the 27th and aa Jll.: Public School 
National Society for the Study of Education (Bloomington, $: 


PULA 
ublishing Co., 1928 and 1940). 
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at 

LQ. of 116, that is 28 points higher. Skeels actually cence ae 
when people adopt young children there is no point in EE 
quiring into the children’s heredity. Other Eae AO 
with nursery school attendance. Thus it was claime dE 
tend to rise from September to April when children a a Foal 
such schools but not during April to September, whic! i aay 
holiday. It was even found that university students w. ou sy, 
years before had attended nursery schools, did better 
telligence tests than those who had not. k iby fol 

Such assertions were naturally challenged, particular! y we a 
lowers of Terman (McNemar, 1940). The Iowa investiga ee 
not seem to have used proper controls, and were sonene mi 
less in their choice of tests and in their statistical metha ea 
calculations. Variations in the norms of tests for young PEN a 
were ignored. The favourable effects of nursery school attan a 
(which are not confirmed by other more scientifically p lay 
studies) might be due to the dependence of the tests On ae 
materials such as are freely used in such schools, or else to 


. : WA 5 Pe test. 
emotional adjustment to social situations such as taking a ts, this 
there were any superiority in the ex-nursery school students, 
might arise from 


Ta spurs : ir being 
teater test sophistication, i.e. to their De 


AARE ints tO 
foster-home data, it is claimed, poin 


$ h the 
the order of 5 to 10 Ponsa 30 
same as Freeman and others found—rather than. 20 
points. 


CHANGES AMONG MENTAL DEFECTIVES lly 
l y re 
The diagnosis and treatment of mental defect has tradition 


t 
; i t tha 
dical profession, with the result t 


ds, however, has sho 


ls to vary in intelli 
intellectual and 
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ee hn less than 7 per cent d T 
oa ilts. Over 80 per cent ee for eee 
>porting in j a > east partially self- 
quite high up th , while mostly low-grade, did 
émployed sere occupational scale. And 40 per FR ES 
40 some 80 per same job for 3 to 20 years. At the age of oe 
n eee cent were married, and the great males A th e 
small group of a ae be progressing normally at Tod Ta a 
risen from an adults whose intelligence was tested, the had 
ee LQ oe sees LQ. of 58 toa a 
ount ORTA s, however, i i i 7 
eae regression effects and PER eae ca 
Another in sia bias 
was that vestigation which aroused wi 
aged oe Schmidt (1946) in Chicago. E 
attended a A had been classified as feebleminded, and wh 
designe d ne ool for 3 years whose curric fully 
Improve aca ees emotional and social adjustment, as well as to 
Were R a and manipulative skills. Subsequently they 
Mean Sta ates ‘or If to 43 years. Over the 74-year period the 
group Ea ea LQ. rose from 52 to 89, whereas a contro 
X to 56. Man. EA ordinary special schools dropped from I.Q. 
complete SA PA em returned to high school, and 27 per cent 
van in emplo year course. When last followed up, 83 per cent 
N ork. By Berle two-thirds in clerical or moderately skilled 
¢ mployment a ast the controls showed very poof educational an 
IN tests of ecords. Equally striking improvements are quoted 
Some a stability. 
we e ees lopies frankly question the authenticity of the 
aenificance ar S a admittedly it is juate the 
paal adjustm chmidt’s test results. But her findings 0” occupa- 
ould appea SH are not dissimilar to those of Charles. Thus it 
r that, by providing the right kind of educational 


Stow up į 
P into reasonably capable adults. On 
i d, or who attend the 


also kn 
know th: 
at those who are institutionalise 
most the same level 


orking W1 
d allowed to live 


or ev 
en to eee 
decline in ability. 
ith imbecile women, 
into jobs 22 i 


S 
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oints 
outside an institution gained in two years some 6 S pre a it 
over control cases who remained institutionalised. ae fee 
may well have been that the experimental group pose a 
initiative and were better adjusted than the controls, so #8) has 
would be more likely to improve in any case. Clarke te ne, 
published several studies of fecbleminded youths T ie play 
adults, which indicate that bad home conditions in early het 
some part in retarding their development. In the first o bs 
patients were tested with Wechsler-Bellevue on bathe 
institution and again after 2 years. Among those who ( eae 
to the tester) came froma reasonably good hakoro pat 
an increase of 4-1 .Q. points, which was just the same as 3 nance 
practice effect. But among those who came from bad | pL; 
where there had been gross neglect or rejection, pee 
averaged 9:7 points; thus the 5-6 points difference coul a Te 
ably be attributed to the improved conditions. Further rete nae 
to further, smaller, rises; but the provision of special vocatio 


ining di oints 
training did not appear to make more than another 2 to 3 P 
erence, 


ENN jga- 
Note that the range of increases reported by British mea 
tors is far smaller than that claimed by Wellman and a wer- 
though the former were, of course, dealing with older and Jo east 
grade subjects, and followed them only over a few years. eis o 
eir evidence is sufficient to negate any complacent accep pa to 
the LQ. as unchangeable. But it would be equally false to recipe 
the opposite extreme, since we certainly do not have any i 
Or converting persons of defective intelligence into norma’ 


BEFECTS OF SECONDARY SCHOOLING AND 
ADULT EMPLOYMENT 4 
$ n 
There is much stronger evidence for the influence of secon 
and later education 


i on the LQ. than of nursery or primary: f 
pointed out in Ch, t A R S are 
higher than those of} er One, average adult tes 


tinue 
at high school and college students co? 


e 
f 
o 
8 t 


G 
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Age 
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a = u h prob- 
ther intellectually stimulating conditions E E Ek 
abl never beyond about 25-30 years, and that 4 e EE 
ae ceases a decline sets in. The majority of the h RR their 
luding almost all average and dull poleen i d indulge in 
Se around 14 or 16, and tend to Eper e SS The more 
leisure pursuits which do little to “exercise mani ie 8, 22 or later, 
privileged minority continue their education hich, make more 
and mostly enter jobs and pursue interests w. umably roughly 
demands on their minds. Thus their increase pe steady average 
balances the majority’s decline, producing a a J is illustrated 1” 
level in the total Population from I5 to 25. T! ha of test norms 
Fig. 7, which shows the characteristic ‘fanning i tend to goo 
among adults. The more intelligent individual chan the least 
increasing longer and decline more slowly, w 


: idence 
3 à : imilar ev 
intelligent stop earlier and decline more rapidly. $ f some 90,000 
was obtained by classifying the Matrices scores 
naval recruits d 


civilian 
uring the war according to age oe ee 
occupation (Vernon and Parry, 1949). The aa not omy 
Categories such as clerks, electrical or woodwor but started tO 
scored higher at all ages than mates and labourers, 
decline later and more slow] 5 ation is p10- 
irect indication of the effects of advanced educ whe 
vided by Lorge’s (1945) and Husén’s (1951) investig dults. Lors® 
the same indivi uals were tested as children and as a educatio? 
teckoned that, at 34, adults who had received SNS oe 
were 2 M.A. years Superior to others who possessed t z tested 72 
igence at 14 but had had no further schooling. Husé ears WES 
men entering the Army at 20 whose LQ.s at I iad gaint 
own, and found that those who had matriculate secondary 
12 LQ. points relative to those who had had no 
education 


intel- 


; kes ? 
make 
That the quality rather than the length of schooling 7 which 
difference ig indicated by a research (Vernon, 19574) evict ay 
e boys in the 14 secondary schools of an d with te 
at 14 years, and their results compare! ears eatlie i 
TQ.s at the time ry school selection 3 to 4 y differe” 
i i ences between boys enter ferences Hh 
regression effects, there were now st’ schoo. 
boys in the ‘best’ and ‘wor: 


are? 
and technical school boys had app 
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gained 7 points over the combined modern schools.t Much of 
this difference might well be due to the grammar school boys 
coming from better homes, with more favourable attitudes to 
education; but this would none the less be an environmental effect. 
The results supply strong grounds for suspecting that some 
secondary modern and all-age schools, actually inhibit the full 
growth of intelligence over the rr- to 15-year period: if the 
pupils and staff are bored or resentful and the teaching 
mechanical; whereas other modern schools, together with most 
technical and grammar schools, are relatively successful in stimu- 
lating the mind and bringing out potentialities more fully. Accord- 
ing to a further research by Lovell (1955), it is mental flexibility 


in particular, and the capacity for forming new concepts, which 
are affected by the adolescent’s intellectual and emotional circum- 


stances, that is those very capacities which are most subject to 
deterioration in adulthood (cf. p- 86)- j 
An objection may well be raised to these conclusions, namely 


that they run counter to psychological teaching about transfer of 
training. Has it not been p | faculties like reason- 


: roved that menta i ) 
ing and memory cannot be trained in general, and is not this 
notion of ‘exercising’ or ‘stim ciously close 


ulating’ the mind suspi s 
to the discarded notion of formal discipline? This is true. But it 
should be remembered that the alterationsin intelligence described 
above are rather small, and quite ficult to demonstrate except 

y surveys of large numbers of pupils or adults. Moreover, 
Psychologists realise now that the initial reaction against transfer 
Of training went too far; a majority of experiments have actually 
demonstrated a good deal of transfer under yess conditions. 
Again, the most relevant experiments—those of Thorndike and 
his collaborators (Brolyer, 1927)—although they are usually 
quoted as disproving any formal disciplinary effect, in fact gave 
some positive support. Thorndike compared the intelligence as 
changes among high school pupils who took a variety of schoo: 


fi * This finding is disputed by Pidgeon and Yates (1957); ho 
igures are adjusted for expected regression 
Hens It is interesting to speculate whether the strong 4 

erman-Merrill L.Q.s at 11 years (cf. fin. p. 55) may not be hath 
Of junior schools in stimulating the intelligence of above-average P CPES 
have a chance of success in the selection (or Scottish. qualifying) examinations, 
It appears to be considerably more marke than in America. 
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i : es were 
“subjects during a year’s ordinary schooling. The ce ee 
very small, but on the whole the biggest rehire i apr Enio 
occurred among those who had taken aritl DE eave 
physics and chemistry; Latin, French and socia s is 
more inconsistent, though predominantly aan EA 
whereas there was a relative decrease among those ai fois “are 
art, cooking, sewing, stenography, biology = ae san 
Unfortunately the investigation did not include a R : Feie 
had no secondary schooling at all over the period, bu Ai 
fairly certain that they would have shown less improve } 


or a decline. 


DISCUSSION 


ings 
Many psychologists, while accepting the experimental pies 
we have outlined in this chapter and in our accoun eee 
differences (P- 174), would assert that they show Se fn 
effects upon intelligence test performance rather han BE oE 
intelligence itself, This is a difficult point, particularly s facility 
have agreed that coaching or practice at tests does improve intelli- 
at such tests without necessarily improving underlying 


cho- 
pence But we would suggest that disagreements among PSY’ 
ogists are large] 


ho 
y verbal in nature. R. B. Cattell and ortie ; 
stress heredity, prefer to keep the term ‘intelligence for tential- 
Intelligence A (cf. p, 34), whereas we regard this innate pO clearly 
ity as of theoretical interest (although its existence E “intelli 
demonstrated by some of the evidence cited), and apply d and 
gence’ to the developed ability which can be observe! was 
measured. Maybe it would saye confusion if the wile: and 
abandoned as being liable to mislead teachers and the public; 
we have seen that there is a move in this direction. talks of 2 
We should realise, however, that when the layman ta e wit’, 
child as being intelligent and thinks he is referring to ee same 
he is really describing Intelligence B—that is, much a The 
product of heredity and environment as the tests a es no 
child brought up in a very unfavourable environment s one 
merely obtain a poorer LQ. than he might otherwise have 4°! 


ag, i 
on Terman-Merrill. He is genuinely poorer at school learning: 
the level of employment 


d in otf 
for which he is fitted, an 
behaviour of an intellectua’ 


hild whe 
has been coached on gro 


l character. By contrast, ar rather 
up tests is superior only in 


a 
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specific type of behaviour, and we have no evidence that his” 


criticisms of intelligence tests—that 
ences. The average middle-class child admittedly has an advantage 
at such tests, partly because of his upbringing and partly, as we 
have seen, because of superior inheritan 


irrelevant from the p. 
that, on the average, he is capa 
and that when, for example, he enters the Services, he tends to be 
quicker at learning and more capable at any skilled job. This was 


proved time and again during World War Il. The same is true of 
ores and educa- 


the working-class child whose intelligence test se 
tional achievements are superior, despite an unfavourable back- 
ground. 

It is in this sense, then, that intelligence is a product of both 
heredity and environment during the early years of childhood, and 


is susceptible to stunting or to further ees during 
adolescence and early adulthood. In this sense, also, its proper 
maladjustment, so that big 


growth may be inhibited by emotional 
increases in I.Q. (paralleled by improvements at school and else- 
ild guidance treatment, 


where) are often reported as a result of me 
or of the treatment mentioned by Schmidt and Clarke. This is no 
revolutionary doctrine; for geneticists do not regard the genes as 

i physical attributes. They 


attribute to develop under 
There may still be some co. 
of hereditary and environmental con 
stressed the importance of upbringing; 
education, whereas the careful statistical analyses of Burt an 
others appear to demonstrate the overwhelmin: 
heredity. Such studies, however, have generally 
on children who were fairly homogeneous, an 
vironmental stimulation did not vary Yoy. widely. M 
Separated identical twins were brought up in quite similar homes; 
Possibly some were not even separated until some time after 
* Admi iti imes di distinguish such increases from those 
due ea aie is sometimes i aion When he first comes to the clinic. 
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erican, 
birth. Within any one culture, such as our own = te Ai aren 
the apparent effects of environment are reduce: 3 oe d to come in 
tend to hear and learn much the same Bee a a o all 
contact with the same pictorial or other symbols. simple con- 
have the opportunity for acquiring much the a yah Pa 
cepts of space, time, number, etc., from o to 4 bs Se hy 
onwards they receive a more or less standardised educa a E To 
also trains A to attend to oral or printed questio. aye 
answer them quickly. Such influences do, of Se chien or 
preciably between different social classes and frenn arently 
even between children within any one family, though Eee Q. to 
not enough to raise the environmental component ce h the pre- 
more than 20 to 25 per cent, The simpler concepts Y c almost as 
school child is building up are probably provided ie DEMO” 
well in a poor, overcrowded, as in a rich, cultured gi tellectual 
tional influences perhaps have a greater effect than intelligence 
ones at this age, Thus there is some evidence that hEn e 
develops better in a ‘democrati’ home atmosphere than 
parents are eith 


; eRe indulgent an 
er too autocratic or rejecting, or too indulg 
Over-protective (cf. Bal 


dwin, 1945). A ith en- 

With children generally, then, heredity as Ypi per 

vironmental factors common to all has something li “begins to 
cent influence. But the heterogeneity of environment 


ttet 
actually an advantage; it pr odes ng 
Predictions. But we are not entitled to apply tests for 
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genetic differences, except in those rare circumstances such as 
Lawrence s experiment (p. 144), where the environmental varia- 
tions have been controlled. The difficulties of this control are well 


illustrated in our next section. 


FAMILY SIZE AND THE DECLINE IN NATIONAL 
INTELLIGENCE 


It has been shown by the work of Burt (1946), Thomson, 
Cattell and others that children from large families 


Fraser Roberts, 
tend to have lower intelligence than those from small ones. In the 


present writer's survey of Army recrui 
ere only childr 


lined till those from families of 13 and over 
ed, a negative correlation of —o'2 to —0%3 
European countries and North 


America, though the trend is sometimes reversed among the more 


intelligent, middle- and upper-class parents, depending partly on 
the family allowances granted by the State, partly on the current 
attitudes towards birth control and to having children. Note that 
this differential birth-rate is not merely a matter of social class. 
Labourers do tend to have larger amilies than skilled workers, 

1 people; but within any one 


and skilled workers than professiona n 
ily children obtain 


sibling the figure dec 
averaged only 87. Inde 
is regularly found in most western 


. class, such as skilled workers, the small-fami 
lly a correlation of 


higher I.Q.s than large-family ones. Natura 
around 0:3 permits of great variations. There are still some large 
hildren, and some dull only children. 


families of very intelligent ¢ | 
However, the trend causes serious CO 0 lation experts 
and eugenists, since it would appear that the brightest stocks are 
generally failing to reproduce themselves and are being pro- 
gressively swamped by the less bright. Pro 
going on for a very long time, but it has become more marked in 
the twentieth century because the differential death-r: i; 
used to wipe out more lower than uppet-ciass children) has ie 
teduced by advances in pub d because ee family 
Rena at ed are 
cand nts. Fraser 

latter tend ERIN ER in life, to space them out 
Gei and to stop sooner 

t is quite easy to calculate from the te 
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s SAES : : j next 
generation the probable distribution of intelligence in the o 
generation, and Table IX shows a cautious estimate pe toai 
Burt from 1920 figures up to the year 2000. The decline of a 


TABLE IX 
1.Q.s DISTRIBUTIONS OBTAINED IN 1920 ANEA 
PREDICTED AT LATER DATES ON THE BASIS (0) 
FAMILY SIZE 


i; No. 9 Proportions 
Intelligence Level LQ. Proportion Children per expected int 
in 1920 Family 1950 
ATES re (ea jae a 
Superior (scholarship, 08 
etc) | 4304. 1:8 2:3 ie 76 
Good -  ./ 115-29] 122 27 103 999 
Average + 2 Aoa 35-1 3:3 33:4 40:5 
Average— = | 85-99] 37.5 36 oap 179 
Dull and backward ` 70-84 11:9 4:2 142 
Very dull (feeble Below 33 
minded, ctc.) | 79 15 4:7 23 
ae 
Average LQ. 100-0 age ae 


A rc ES |. +» l OTA 
. . ; y 
1} points in the average I.Q. per generation is bad enough br 
cvcn more striking are the predicted effects at the top and Bi p be 
ends. The numbers of children with I.Q. 1304+ are expect] ble d. 
halved in 80 years, and of those below 70 to be roughly Ses 
Some supporting evidence was provided by the Royal, imed 
missions on mental deficiency in 1907 and 1929, which ca the 
to discover a big increase in the numbers of defectives 1 sen 
population between these dates. However, the figures have Jete 
widely disputed, because of the difficulties of ensuring COPMf ica- 
ascertainment, Hence the Scottish Council for Research in E h1 
tion (1949). undertook to apply the same test to all Scota gap 
year olds in 1947 as had been used 15 years earlier. THe fieh 
of about half a generation was, of course, rather short, but d be 
such large numbers a decline of even half a point SROH pere 
noticeable. Actually, as explained in the previous chapten e 0 
was a rise in the mean group test score, probably attributa put 
Sreater test-sophistication of many of the second group: 
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representative samples of 7-800 had also been given Stanford- 
Binet and/or Terman-Merrill—that is, individual tests which were 
unlikely to be more familiar in 1947; and from their results it was 
calculated that there had been no change either up or down. Des- 
pite the most careful analysis of the data, it has not been ossible 
to decide why the predictions of a decline were falsified. Some 
geneticists consider that nature provides a kind of biological 
stabilising or compensating mechanism. Alternatively, improve- 
ments in child health and education over the period might suffi- 
ciently raise the average I.Q. to mask a small genetic decline. But 
if the low intelligence of large-family children were merely an 
environmental phenomenon, then of course no decline at all 
would be expected. It might well be argued that family circum- 
stances and the amount of attention and stimulation given to each 
child are impaired when the number of children rises above two. 
If this were the complete explanation, the later children should 
perhaps be less intelligent than the earlier. There is, in fact, a 
slight tendency for the first and last child in any family of three or 
more to be more intelligent than the middle ones, and these are 
the ones who often get most parental attention. But there is no 
further correlation between 1.Q. and position in family. Thus the 
only safe conclusion is that we do not know. There may be one 
genuine, though small, tendency for the genetically Jess-intelligent 
stocks to be more fertile which will need to be watched. But so 
far this has had no demonstrable effect on Intelligence B. 

ST! i t that dull parents do not only have dull 
ES mi have a eE al ones who E sel abe ae a eit 
relatively bright or near-to-average Ones ee en eae epea don 


asis of regression, and si 
arge numbers. Certainly something oF t : 
case of height and other physical qualities, which are very 
determined, and are likewise negatively correlated wi 
certain that any marked decline in the average 
past 5o years or so would have been detected. 


Lay 


Chapter Ten 
SOME RESULTS OF MENTAL TESTING 


DISTRIBUTIONS OF INTELLIGENCE AND 
ATTAINMENTS 


£ ; imate pro- 
THE conventional 1.Q. categories and their ex These are 
portions in the total population are shown in Table X. 


TABLE X : 
THE DISTRIBUTION OF INTELLIGENC 


Per cent 
Q. Category Soe 
130 & over | Very subenor ye a v . p i 6$ 
120-9 Superior . 5 5 5 . : y 16 
110-19 Above Average or Bright Normal . : i 50 
90-109 Average . $ ; ‘i . - 2 16 
80-89 Below Average or Dull Normal . : $ 6$ 
70-79 Dull $ 3 à i . : g 
40-69 Highgrade Feebleminded or Moron tal sae 2} 
20-39 Lowgrade Defective—Imbecile SSEM - 
%19 » » —ldiot 
d 
5 andar 
based on the assumption ofa normal distribution, with a St 
Deviation of 1 


m 
seldo: 
S—an assu: 


mption which, as we have seen, "litera Ya 

exactly fulfilled. The Categories should not be taken too een 1 
For example, there is scarcely anything to choose beni alter 
(average) and 110 (above average); an individual can F sah 
Own on retesting. In particular, a dedness 
LQ. does not necessarily show mental defect or feeble PTE su 

Categories of attainment in reading, arithmetic and Jucation: 
jects are not 80 well established. But generally, E onsi ered 
Quotients of 115 upwards (16 per cent) would be ¢ war 
Superior, and 85 (16 per cent) or 80 (9 per cent) downy ability 
Backward. A large-scale survey of reading comprehen cg 2 
among school-leavers and adults, carried out by the if arbitrary? 
Education (1950), attempted to fix acceptable, even ede able 
borderlines for literacy in terms of Reading Ages. 
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Taste XI 


DISTRIBUTIONS OF READING ABILITY AMONG 
15-YEAR-OLDS 


Grade of Reading All 15- | Grammar-| Modern | Post-war 
Reading Age Year Technical | Schools Norms 
Superior 29-6 1:5 14-7 
Average + 57:5 25:3 37-01 
Average — . 12:0+ 25:5 10:9 
Backward. 9:0+ 21:8 2:0 
Semi-literate . 7:0+ 46 0:0 
Illiterate . | Under 7:0 1:6 0-0 
ail ees Ch ea eee 


found in 1949 for all 1 5-year-olds, and for 
d technical) schools and non-selective 
schools. These figures are based on 
column shows the proportions 


shows the proportions 
selective (grammar an 
(modern and unreorganised) 
pre-war norms, and the last c 
according to post-war norms; here, of course, the percentages of 
Superior and Backward or worse approximate closely to the 
expected 16 per cent.1 Note the strong differentiation between the 


different types of school. Some 40 per cent ofall secondary modern 
school-leavers were backward or worse by pre-war standards, 
though the number of complete illiterates was much smaller than 
is sometimes feared. Later surveys (Ministry of Education, 1957) 
with the same test showed that the proportion of Backward or 

over the next 8 years, 


Ww er cent 
paces a B eerie not been fully regained. 


although pre-war standards had still 


Until some agreement is reached on definite categories : 
attainment in other subjects, there is little point in tying e 
describe how they are distributed in the population at large. 


- d 
However, some indication of standards among school Jeavers anı 
Young adults is given by the following list of spelling ge an 
arithmetic sums, These fall approximately at the 9sth, 8sth, - is 
sth percentile levels; that is, the first word or sum WO 


e 
roughly characteristic of the attainments of the best 10 per cent of 
the population, the last one of the weakest 10 per en : 

* The dividing-line between Average+ and Average- was taken as 14: 
years for post-war leavers because of the change in age of leaving. 


’ 
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Percentile i i 
Level Spelling Arithmetic and Mathematics 


gsth PARALLEL VI + 52 = 

85th ACQUIRED What is simple interest on £250 for 
2 years at 5 per cent? 

7sth DISEASE What is the area in square yards of a 


30 ft. by 24 ft. room? 
6sth SCARCELY If 3x + 2 = 14, x = 
ssth CONCEALED Write one-quarter as a decimal 


4sth SCHOLAR Simplify 8 = 2 


26 52 
3sth PRESSURE ł+4= 
25th JUICE How many pints in a gallon and a 
half? 

Isth DOZEN Subtract: s. d. 

8 6 

-4 $ 

sth TABLE Add SSS ia 

+715 


There are, it should be noted, considerable sex differences, 
females being telatively superior in spelling and inferior in arith- 
metic. Moreover, the standards of 14-15 year olds tend to be 
somewhat higher than those of young adults. For there is strong 
evidence that the attainments of school-leavers in these drilled 
subjectsis maintained or improved onlyamong the more able who 
continue their schooling, or who enter jobs where they continue 
to practice their skills (cf. Norris, 1940). The average pupil tends 
to decline appreciably by 18, and the more backward ones more 
rapidly. Indeed, the lowest 10 per centof pupils who approximated 
(at best) to 11-year level in arithmetic and spelling before leaving, 
have dropped back to 8-year level some 4 years later. Reading 
comprehension, by contrast, seems to go on improving, since 
h great majority do continue some sort of reading in daily 

e. 


A short digression is appropriate here on the alleged decline in 
educational standards over the past 50 years, which is often be- 


wailed by public speakers and writers to the Press. We suspect that, 
apart from the war-time decline previously mentioned (p. 116), 
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these judgments are based almost wholly on comparisons between 
unrepresentative samples. For example, there are more than twice 
as many grammar schools as there were $0 years ago and, despite 
more efficient selection nowadays (by ability rather than by in- 
come), this inevitably means that somewhat lower ability strata 
are admitted. The best grammar school pupils are probably no 
worse, but they are more diluted. Equally, present-day modern 
and unreorganised schools are more effectively creamed, so that 
their average standard must be lower too, unless the efficiency of 
their education has much improved.! An equally striking re- 
distribution of employment has occurred between about 1910 and 
1950. The numbers engaged in Textiles, Clothing, Mining, 
Agriculture and Fishing have dropped from 29% per cent to 
184 per cent; but the numbers in Metal and Chemical Manu- 
facturing and Engineering, in Administrative and Professional 
Services have risen from 204 per cent to 37% per cent. Such 
favoured occupations as electrical and other skilled engineering, 
the Civil Service and Local Government are probably attracting 
so much more of the available talent that other employers have to 


go much farther down the scale than they used. Thus business 
men’s secretaries may seem “lliterate’ nowadays, but not because 
there has been any decline in the amount of intelligence or educa- 
tion in the population as a whole. The critics who deplore declines 
in cultural interests are also probably not comparing like with 
like. They are usually people of good intelligence and education 
who base their standards on a circle of acquaintances similar to 
themselves, and are shocked by the leisure pursuits, or the fon 
ite newspapers and broadcasts of the mass of the present-day 
population. A fairer comparison would be between the present 
poorly cultured majority and the slum-dweller of 40 years 280 
who worked much longer hours and whose favourite leisure 
pursuit was alcohol. Actually, there is ample evidence of far more 
library reading, more adult education classes, more amateur 
musical and dramatic societies now than in the past. 

‘It i average levels in both types of school can fall 
without z ER e tee These hypothetical fgura iee E 

Grammar school 14 per cent with meni RO ern s 
T cent wi .Q, 96 give an over .Q. 100. 

Samaria a eo with mean E.Q. 121 + Modern school 72 

per cent with mean E.Q. 92 also gives an average of 100. 
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CHARACTERISTICS OF THE TAILS OF THE DISTRIBUTION: 
MENTAL DEFECTIVES 


Mental deficiency is a legal rather than a psychological term. 
It refers to adults who “require care, supervision and control for 
their own protection or for the protection of others’, that is 
roughly the $ per cent mostincapable of the population. Certifiable 
children were defined as those “incapable of receiving proper 
benefit from the instruction in ordinary schools”, that is about 
24 per cent. However, with the passing of the 1944 Education Act, 

term was restricted to relatively low-grade children or those 
presenting serious behaviour problems; while the lowest 10 pet 
cent or so of the school population in attainments (those with 
E.Q:s below 80, regardless of I.Q.) are classed as Educationally 
Subnormal or E.S.N. Because of insufficient provision, only a 
small proportion of the most backward are, in fact, transferred to 
special schools or classes. Likewise, the number of available places 
in institutions and hospitals largely determines how many de- 
fectives are certified. 

While certification (in the case of children, called ‘ascertain- 
ment’) is rightly based on social and educational criteria, and is 
done by a doctor, it is usual to have them tested, preferably by a 
qualified psychologist, with the Terman-Merrill or Wechsler 
scales if only because these provide a more objective criterion of 
intellectual subnormality than do judgments of adjustment. 
Nevertheless, there are many certified adults and children with 
I.Q.s of 80 and over who are too emotionally unstable to be 
allowed to go at large, and many with 1.Q.s of 60 or under who 
are sufficiently stable to get along fairly well in school or in 
employment. On the whole, however, low intelligence and poor 
adjustment tend to go together; defectives are also poorer in 
health and physique than normals. Clarke (1958) brings out the 
tremendous variation in types and causes of mental deficiency- 
The lower-grade imbeciles and idiots, as we have seen, arise 
largely from pathological causes and/or rare genes, whereas the 
higher-grade fecbleminded are often referred to as ‘sub-cultural’. 
Some of the latter may show pathological symptoms, and prob- 
ably most of them are low in Intelligence A, but generally also 
they are victims of their environment. The lower grades never 


progress much beyond the level of development of young children, 
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and have to be cared for at home or in institutions; many cannot 
learn to speak or to keep themselves clean, though others can some 
timesbetaughtsimpleroutineactivities, orhousework or gardening. 

With high-grade morons and the top of the imbecile group, 
however, it would be rash to set any such limit (cf. Wallin, 1956; 
Sarason and Gladwin, 1958; Clarke, 1958). It used to be said that 
they were educable roughly up to their maximum Mental Age; 
for example, a 15-year old with I.Q. 60 should be able to read as 
well as an average 9-year old. And it was believed that they could 
succeed only in unskilled employment under sheltered conditions. 
Many, in fact, never make appreciable educational progress, either 
through poor adjustment, continued adverse home circum- 
stances, or through vegetating in special schools or institutions. 
But others are more successful (even if Schmidt s claims are 
greatly exaggerated); and they can often be trained in fairly 
skilled work, though taking a good deal longer to learn than 
normals. Indeed, their practical capacities are likely to be superior 
to their verbal ones (cf. p. 118). (The occurrence of idiots savants 
who combine exceptional numerical or other talents with very 
low intelligence is extremely rare.) They need more guidance and 
supervision than average persons, but they respond normally to 
incentives and to a helpful environment. 

It has often been cape that defective children and adults ae 
Particularly apt to become delinquents and criminals, presumably 
on the grounds that crime is an unintelligent way of gaining one i 
ends, But it is quite untrue that even a majority of pase an 
delinquents are of very low intelligence; many, score woe ove 
average. Indeed, some investigators have obtained degi eo 
very similar to those in the non-criminal population, thoug a 
majority would place their mean I.Q. round abont ey 90, F 
in the backward class. The interpretation even of js degree 
subnormality is dubious. After all, unintelligent delinquents a 
more likely to be caught than intelligent ones. Morr we teng 
to find both more crime and more persons of low intelligence 1u 
Overcrowded, slum conditions and in the lower Soe RT, 
grades. Thus the association may be as spurious as Lee a 

oubtless exists between criminality and—say—a liking 5 tF 
and chips. The sensible conclusion would seem to be that A urt, 
in his Young Delinquent (19254), that low intelligence ay e one 
factor in delinquency, though usually combined with tempera- 
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mental, environmental and other factors. Apart from these kand 
caps the defective child, according to Wallin, is more likely to be 
happy-go-lucky and tractable than criminal. 


GENIUS AND THE HIGHLY INTELLIGENT 


A mine of information on the characteristics of the top end of 
the distribution is provided by Terman’s Genetic Studies of Genius 
(1925-47). Vol. TI selects some three hundred persons of PENT 
ledged eminence from history and, by a careful analysis o 
their biographies as children and their recorded achievements at 
various ages, tries to estimate their childhood I.Q.s. A few, such as 
John Stuart Mill, Grotius and Leibnitz, were clearly capable at ths 
age of, say, 6 of intellectual operations that would tax a norma 
12-year old; and their brilliance persisted. Thus it is reasonable to 
credit them with I.Q.s approaching 200. The average for the 
whole group was 155; that is, very superior. But this means that 
some were distinctly lower. Oliver Cromwell, for example, was 
estimated at 110 only. However, the proportion showing little 
promise in childhood was far smaller than many laymen imagine. 
Another common misbelief is that genius tends to be narrowly 
confined to special fields, Actually, versatility was the rule rather 
than the exception. Apart perhaps from musicians and artists, 
those who were outstanding in one special field were generally 
above average in several other fields. Thus g-factor theory stil 
has some application even at this level. But Terman freely admits 
that traits of personality are at least as important as intelligence m 
the formation of genius, and points out, too, how often historical 
accident and favourable environment play a part. 

The other volumes start, as it were, at the other end and study 
the characteristics of some 1,500 children with I.Q.s of 135 uP- 
wards, representing the brightest 1 per cent of the Californian 
10-16 year population. These have now been followed over 35 
years, and though Terman does not claim that any are or will be 
geniuses, they certainly show very remarkable achievements. 

en originally compared with a large control group of norm: 
children they were far advanced educationally in all subjects, 3$ 
Their average E.Q. was 140, though in terms 
was only 114 because the schools had failed to 
rity. Less expected was their general superiority 
wth, Probably it is because the very bright chil 


would be expected. 
of school grades it 
realise their superio 
in health and gro 
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is so often the youngest in his class and is being compared wi 
much older average children that he is Ee Nigh roet of = 
puny. Their range of interests was also wider and superior in 
quality. For instance, they read more and better books and made 
twice as many collections. So far as could be determined from 
tests and ratings, they tended also to be superior in social and 
emotional development and moral character. It did seem, how- 
ever, as though the brightest of all, with LQ.s 170+, tend rather 
more frequently to experience difficulties of adjustment at home 
or school unless brought up with other bright children. 

Vol. IV, which traces the adult characteristics of 96 per cent of 
the original group, is particularly interesting. The occupational 


distribution for men, compared with that for the general popula- 


tion, is shown in Table XI. It will be seen that over two-thirds are 


Taste XI 


OCCUPATIONAL DISTRIBUTION FOR TERMAN’S 
GIFTED GROUP (MALE) 


Gifted General 


Occupational Group ern Popiltion 
Professional 3 r à 45-4 5:7 
Semi-professional and higher 

business A r i, 25:7 8:1 
Clerical and skilled trades - 20:7 24:3 
Farming . 0. 0: 1:2 12-4 
Semi-skilled . -> 6:2 31:6 
Slightly skilled and labouring 07 17:8 


in the two highest grades, but that at the same time there is a 
small proportion, 8 per cent, in the three lowest (c£. p- 123). 
Similarly, over two-thirds of men and women had graduated from 
college, though 5 per cent also failed. Twice as many had obtained 
k Sher degrees as do graduates in general, d five to eight times 
he many had Ph.D.s. Yet the gifted group also showed a superior 
Bhs of extra-curricular activities—they were not merely ook- 
è - There was no evidence that educational acceleration (ie. going 
pt college at an early age) had been in any way harmful. Again, 
Ih thi i er rise. By 1955» 86:3 per cent 
Were in eter vaeny tae hes i the abs two, groups. 
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a larger proportion of men than in the general population had 
had war service, and nearly 75 per cent of these obtained com- 
missions. The delinquency and crime rate of the gifted group was 
extremely low, and the incidence of mental breakdown was no 
greater than in the population at large. There was no indication 
that they tended to become ‘cranks’ ; thus their political affiliations 
were similar to those of the general population. Their marriage 
rate was higher than that of average graduates, and their divorce 
rate normal. Their fertility so far was low, but their children— 
many of whom have been tested—had a high mean I.Q. of 128. 
This figure is, of course, lower than that of the gifted parents, as 
would be expected both because the spouse was usually less 
intelligent and because of regression. Seven per cent of the children 
fell below 100, but 15 per cent (as compared with about 1 per cent 
in the general population) scored in the same high category as the 
parent. 

Much additional case-study material is given on the achieve- 
ments, publications, etc., of individual members of the group. 
Special attention is also paid to those who careers were relatively 
unsuccessful. Their initial ability and school records were no whit 
inferior, though they were clearly beginning to drop behind 
m college. Their environments tended to be less favourable, and 
mM most instances certain weaknesses of character and emotional 


adjustment had begun to show themselves in childhood. Never- 
theless, Terman’s researches cı 


theles E ertainly demonstrate that a high I.Q. 
in childhood is strongly predictive of superior educational achieve- 
ment and a very favourable factor in the vocational and other 


spheres. The child of exceptionally low intelligence shows much 
the same characteristics in reverse. 


SEX DIFFERENCES 


Numerous comparisons have shown the average scores of boys 
and girls, or men and women, to be the same on general intelli- 


gence tests, though there are some differences on the group factors 


that enter into many tests. Girls do a little better on most verba 


tests and on tests involving rote memory, boys on tests of in- 
ductive reasoning and arithmetical ability. But there is a great deal 


1 From about 1940 to 1950 girls usually did better than boys at the intelligence 
and arithmetic group tests in 11-year selection examinations, as well as in Eng- 
lish, and different borderlines often had to be drawn for entry to grammar 
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of overlapping, and the average differences seldom exceed about 
4 points of I.Q. The most marked difference occurs on spatial 
and mechanical tests. Whether there is any innate factor in boys’ 
superiority, we do not know. It might well be attributed to the 
cultural influences in our own civilisation which encourage boys 
to develop physical, constructional and mechanical interests. 

Many surveys of intelligence and attainments have also demon- 
strated that the range or spread of ability (as distinct from the 
average performance) is slightly more restricted in girls. The 
evidence is not unanimous, but there do seem to be larger pro- 
portions of very able and of very backward boys. Society also 
recognises more men than women of outstanding talent in almost 
every walk of life, and there are more male mental defectives. 
Obviously it would be rash to argue that innate factors are 
responsible, since our culture supplies far less encouragement and 
Opportunity to women. Moreover, female defectives are less 
troublesome to look after at home, so that this difference might be 
merely one of ascertainment. Again, any difference may be not so 
much one of ability as of temperamental traits and interests, in- 
nate and acquired. The boy, in our Western culture at least, tends 
to be more rebellious and more ambitious than the girl, and may 
thus cither fall further behind or progress further ahead in school 
work and in intellectual activities generally. 


AGE CHANGES 

One of the remarkable findings of American Army testing 1n 
1917-18 was that adult recruits scored no higher than 13-year 
children. Other tests, too, have shown little or no increase beyon 
Ts, and it was widely accepted that intellectual growth reaches its 
maximum at about this age and then stops. Both information, 
Occupational skills and worldly experience obviously go on in- 
creasing much later, but these were distinguished from inteli 
gence—the capacity for problem solving, for seeing relations an 
for new learning, which not only failed to improve but soon 
Started to decline. On tests of a more informational type like 
Vocabulary there seem to be rises up to the 30s, and relatively 


schools, It seems likely that boys were more seriously affected by war-time 
relaxations of home discipline and upsets to schooling. By the later 19505, boys 

id generally caught up in intelligence tests and regained their superiority in 
arithmetic (cf. Emmett, 1954). 
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slow decay. But on abstraction and non-verbal sanane, aa 
as Progressive Matrices) the decline begins in the ae Rates 
earlier (cf. aoe a Parry, 1949). Fig. 7 is adapted from 
standardisation of his Vo ae 
Tange 5 to 65 years (Foulds and Raven, 1948). By 60, the averag' 
adult has dropped back to 

Matrices.1 Similarly, 
Information, Compr 
with age; Block D 
Memory decline much more ra 


F ifferent- 
these results; based oncomparisons between diffe 
aged cross-secti 


siderable cauti 


their snags. The We 
Items, but the test-sophisticated or better i Jev 
i they have been above average in initi: se. 
an therefore —as Fig. 7 shows—less apt to decline in any “A 
Particularly illuminating is the work of Welford (1958) am jety 
ology Department on ageing. Using a od that 
Ulsition of skilled movements, they eters 
; ‘© compensate for such © 
handicaps as decreased visual and au Be acuity, speed oe 
of their greater experience t ie, 
» at familiar tasks, especially tho their 
cy and Fsponsibility which can be done at ge 
1 Raven's more recent data ine in Vocabulary after 3 
60 than is shown in the ee ae 5 steeper decline in Vocal i 


à 4 ure 
‘ 5 gh it is extremely difficult to ens 
samples of ‘normal older Persons are teally representative. 
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own rate. They are less good when they have to work under stress 
of speed, and are generally slower and less mentally alert. But they 
chiefly show their age in comprehending, interpreting and organ- 
ising new material—whether it be the perceptual and motor 
elements in a skilled job or the verbal data in a logical or intelli- 
gence test problem. Indeed, their previous habitual modes of 
response often make them unduly inflexible and less capable of 
adapting to thesenew requirements. Similarly, the work of Thorn- 
dike (1928) and others shows that adults of, say, 40 to 50 are little 
inferior to 15-20-year olds in the learning of straightforward 
topics like a new foreign language, but are much poorer at tasks 
which involve the breaking down of old habits, as in foreign 
language pronunciation. 

Clearly it is too superficial to think of intelligence as a single 
entity growing to a maximum and declining. Rather there are 
qualitative changes with age, and our conventional intelligence 
tests hardly give scope to the ways in which the older person 
expresses his ability, though they rightly show that he is handi- 
capped in rapid adaptation to, and solution of, new mental or 
physical problems. In other respects which we call wisdom, 
experience, judgment, knowledge and skill, he is a more capable 
member of the community until such time as his deteriorating 
sensory and motor capacities and increasing mental rigidity begin 
to outweigh these assets. 


GEOGRAPHICAL DIFFERENCES 


No one has tested sufficiently representative samples of the 


total U.K. population to say precisely which are the most or least 


intelligent regions. Nevertheless, both Moray House results at II 
(Emmett, 1954), and the writer’s analysis of Army National Ser- 
vice tests (Vernon, 1951) show distinct differences between 
Counties, ranging up to 6 or even 10 points of LQ., though these 
two enquiries largely disagree in their findings. A few generalisa- 
ons are possible. i 

I. Much the most intelligent areas are those on the outskirts of 
large cities, for example the Home Counties around London; and + 

lowest are the overcrowded areas within such cities. 

naturally follows from the distribution of intelligence among 
Occupational classes. : 

2. Areas with large Irish populations (e.g. Glasgow and Liver- 
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i his is 

and Welsh counties tend to score below average. This 
ey due to language handicaps, since it is found also with 
non-verbal tests. It too reflects occupational differences. WU 

3. Urban children in all areas average some 2 points higher a 
tural, though this varies considerably from county to county, ona 
has not been confirmed among adults. Numerous factors ee 
volved here, and we would merely be airing our prejudices i 
tried to assess the relative importance of the following: he 

(a) Brighter families may be more apt to migrate to 
towns. 

(b) The tests may favour the town child, both in their coni 
or in the knowledge they assume and in their emphasis on qui 
ness of thinking. : Ra ot 

(c) Town children may be more sophisticated at taking D 


and better stimulated by the education received in the large urban 
schools, 


RACIAL AND NATIONAL DIFFERENCES 


A similar situation is found when we compare test results 
different parts of the world 


„namely, there are quite large aver 
erences (with much overlapping) but also grave doubts ree a 
ing their proper interpretation. The most extensive data are tho $ 
from American Army tests of 1917-18, when hundreds of psa 
ands of recruits were foreign born or the children of Waa P 
Classified by country of origin, the approximate order of abil ty 


was: English, Scottish, Dutch, Danish, German, (Canadien 
Swedish, Norwegian, Irish, Austrian, Belgian, Turkish, GreeX 
Russian, Italian, Polish 


» Negro. These results are particularly 


questionable since it is ely that the immigrants were F 
samples of their nations. Thus the British were mostly descende 
from the original pioneer 


stocks, whereas the Irish, Poles aD 
others came from poorer peasant families emigrating in the lat x 
nineteenth century. It would be easy to prove the Irish mort ‘ 
intelligent than the English by drawing samples from the bes 
suburbs of Dublin and the worst slums of London. d 
However, other more careful studies confirm the general trend. 


Jews often get the highest average (say, I.Q. 105). American whi S 
and north-west Europeans are all much the same, southern ana 
eastern Europeans together with English-speaking Chinese Ea 
Japanese somewhat lower (say, 90 to 95), then American negro 
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(about 85), and Australian aborigines and other relatively primi- 
tive peoples lower still. At the same time individual differences 
within racial or national groups are far larger than differences 
between groups. At least 10 per cent of negroes surpass the average 
American white, and 10 per cent of whites score lower than the 
average negro. 

There is probably no reputable psychologist nowadays who 
would maintain that these results represent genuine innate racial 
differences. Several might state the exact opposite, pointing out 
that the superior groups are just those which provide the best 
economic and social conditions and the best education. But the 
majority would be more likely to say that we cannot really make 
valid comparisons at all, since no tests can be devised which are 
culturally neutral’—that is, equally fair to groups with very 


different upbringing. 
ee ae tion, Klineberg (193 5) tested negroes who 


In a famous investiga ais 
had lived various lengths of time in New York; that is, in an area 
rtunities approximate 


where their education and economic opportt 
much more closely to those of whites than in other areas from 
which they had moved, such as the Southern States. Those with 
less than a year’s residence obtained an average Stanford-Binet 


LQ. of 814, those with 4 years or 
more marked on a verbal group test, DU j 
Paper Formboard (spatial) test. While this shows the importance 
of cultural factors, the fact that 


far ther towards 100 might be take 


important. Many investigators h: 

different parts of the a (e.g. Porteus with his Mazes). Though 

the national and racial differences tend to be less marked than vat 

Verbal tests, they do not disappear} thus they are not purely 
guistic or educational. But this does not prove anything, for itis, 

obvious that white children gain far more experience with pe 

tures, in manipulating blocks, or in drawing—.e. with the kinds 


i : i —than 
ot materials and operations involved in performance tests: th: 
ae eoples. However, 


do the children of more so-called backward p 

Studies of pre-school children generally yield rather smaller 

differences, and the inferiority is most marse when older negro 

children or young adults are tested or powers of abe eu 

ing (cf. h results, it seems quite probav 
8 (cf. Shuey, 1958). From such res m eak 


that genetic differences between racial an 
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—at least in some aspects of intellectual ability—though we have 
no satisfactory way of proving it. ‘ 
Bebe aris) who has wide experience in the testing of 
Africans, points out the fallacies of assuming that, by using pictures 
of objects familiar to the people concerned or abstract diagrams 
and shapes, cultural influences can be eliminated. The whole con- 
ception of pictorial representation on paper is outside the experi- 
ence of many African tribes, and they may fail to recognise such 
pices though quite familiar with similar carvings on ivory Or 
eather. Again, their concepts of space and direction probably 
differ in various respects from our own, so that diagrammatic 
tests may present quite unforeseen problems to them. We are 
apt to forget also the importance of the testee’s attitude or ‘sets (cf. 
P- 73), since we have become so used, as school children, to 
answering silly questions put to us by adults as quickly as possible. 
In addition to educatio handicaps, the African may do badly on 
tests devised by Europeans because this attitude of competition 
and rapid response is unnatural to his culture, Even as near home 
as the Hebridean islands, it was found that the more leisurely 
tempo of the children’s lives greatly affected their responses to 
Intelligence tests given in Gaelic (Smith, 1948). Again, among 
Australian natives the accepted reaction to difficult problems 1s 
group consultation, not individual effort. These natives are in 
many respects highly ‘intelligent’ in relation to their barren en- 
vironment, though obviously backward by European standards. 
It is for such reasons that psychologists like Biesheuvel, together 
with most anthropologists, regard attempts to compare the 


intelligence of different cultural groups as futile, 
If we are to make progress it is likely that verbal tests, prefer- 


ably given orally in the vernacular, provide a better medium than 
any other, since language is a universal instrument for the hand- 
ling of complex intellectual problems, Reasonable comparisons 
might be made between pairs of groups whose modes of thought, 
ama and syntax have been shown, by anthropological and 

guistic studies, to be closely similar; for we might then be able 
to construct tests which presented com 


o mem- 
Berane parable problems t 
_ These somewhat pessimistic conclusions do not imply that 
intelligence and other tests cannot be useful within groups other 
than western European and American. They can be, and have 


SOME RESULTS OF MENTAL TESTING 177 


been, constructed and applied for educational and vocational 
selection and other purposes in many parts of Africa and India. 
Biesheuvel, for example, has an extensive battery of performance 
and manual tests for South African mineworkers which are given 
in group form with cinematograph demonstrations of what the 
testees have to do. A particularly illuminating piece of work is 
Scott’s (1950) production of group verbal tests for selection for 
intermediate and secondary schools in the Sudan. An oral test, 
based on Ballard’s, was devised for the former level (about 11 
years), and a written one, based on Moray House, for the latter 
(about 16 years). But straight translations of these models into 
Arabic were useless. Many of the original items aroused quite 
different associations among Sudanese pupils, and only after 
lengthy experimentation with some 900 items were sufficient 
suitable ones obtained. Scott also found it necessary to introduce a 
lengthy process of ‘warming up’ or preliminary practice, and to 
give much more time than in an English setting. But satisfactory 
reliability coefficients and correlations with subsequent educational 
Success were eventually achieved. 


Urq 


Chapter Eleven 


EDUCATIONAL AND VOCATIONAL 
IMPLICATIONS OF INTELLIGENCE TESTS 


FACTOR ANALYSIS 


SOME summing up of the results of factorial studies is desirable 
before we consider the relevance of intelligence tests to educa- 
tional and vocational guidance and selection. It was shown 1m 
Chapter One that one of the most fruitful sources of information 
on what tests measure is an analysis of the way they tend to group 
together. If a dozen tests, A, B,... L are given to the same 
testes, and the correlations of A, C, D and F with one another 
are higher than with the remainder, we can observe the common 
features in these tests which are not presentin the others and deduce 
the nature of the group factor that is operating. The appropriate 
Statistical techniques developed by Spearman, Burt and Thurstone 
enable us to measure how much of the general factor, which runs 
through the whole battery and of this or other group factors, are 
present in each test and how much is specific. 

The following seem to be the main factors involved in the 
tests which we have described in Chapters Four to Six. 

g- We have shown that g cannot be identified with any clear- 
cut mental faculty, but is rather the common element remaining 
once the group factors present in all tests have been allowed for; 
also that it is likely to differ in nature among younger and older 
children and adults, A further complication is that it is always 
Most prominent in heterogencous populations such as a complete 
age-group of children or unselected adult recruits, but may almost 
vanish if a selected group of narrow ability range is studied, suc? 
as university students or Army officers (hence the preference © 
Thurstone, Guilford and other American factorists for analysing 
abilities entirely in terms of group factors). Very probably, agains 
g 18 more pronounced among low-grade populations; , thus 
manual and mental tests tend to inter-correlate more hig 
among defectives, 


Nevertheless, we can agree with Spearman that it is something 
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in the nature of grasping relationships, and is most strongly evi- 
dent in tests of the more complex intellectual functions such as 
reasoning and abstraction, in the learning of classics and in 
problem arithmetic. Indeed, it is just as plausible to analyse the 
main factors present in ordinary intelligence tests into Thurstone’s 
R (reasoning) -+ V factors as into g+ a verbal group factor. It is 
ess prominent in more rote abilities such as memorisation, 
mechanical ability and occupational skills, and (within the normal 
range) is almost absent in manual and physical capacities. But 
there are no pure g tests, and even the Terman-Merrill, or verbal 
abstraction tests or non-verbal tests like Progressive Matrices, do 
not involve more than about 5o per cent of g. 

v. The verbal factor enters into all verbal tests such as vocabu- 
lary, analogies, reading comprehension, spelling, etc., over and 
above g, and is so largely dependent on schooling that the writer 
refers to it elsewhere as the v:ed factor (Vernon, 1950). Particu- 
larly among unselected adults it tends to interact so closely wi 
g that it is difficult to distinguish general intelligence from amount 
of education received, But in more selected groups—grammat 
school pupils, for example—it may differentiate into a number of 
factors along the lines of different school subjects. Thus, most 
pupils who are above average in English composition will also 
be good in foreign languages because of their common g and v 
content, but some may be relatively stronger in one than the 
other, indicating the influence of minor specialised abilities. 
Verbal psychological tests can similarly be classified under partially 
distinct factors of verbal comprehension, fluency, induction an 

eduction, concept-formation, etc., and—at the highest levels— 
under Guilford’s long list of evaluative, planning, creativity and 
other factors. However, there is no general agreement yet among 
factorists as to the number and precise specification of these sub- 
factors, 

n, or number ability, likewise te. 
educational factor. It is most clearly pre 
Metic tests. More advanced arithmetic an 

arger g component; and at secondary o 
ely to be some differentiation of geo 


other specialised abilities. 
k:m and S. Earlier writers identified a k (kinesthetic) factor in 
tests involving the manipulation of shapes in the imagination, and 


nds to overlap with the verbal- 
sent in mechanical arith- 
d mathematics bring ina 
r higher levels there is 


metrical, scientific and 
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an m (mechanical) factor either in tests of a fies ele naa 
type or in those based on pictures of mechanisms. a aie 
ability, however, seems to resolve almost completely into oe 
one hand and knowledge of, or experience with, mec ae 
things on the other. Physical capacities and manual detena F 
also overlap with this factor, though they are largely spea ae 
For example, quickness in screwing up nuts and bolts cor; a i 
little with speed of running or reaction time, or with ability il 
formboard test like the Goddard; and none of these agree ENE 
closely with any trade skill. Thus k:m stands for the compis 
which we call practical, as contrasted with academic or E 
factor. The more difficult performance tests such as Kohs Bloc L 
and Cube Construction measure the same k ability as a ile 
pencil spatial tests (together with g and specific factors); thus a 
are useful EEEo But the simpler picture and formo 
tests are mostly rather unreliable g tests with large specific co! 
ponents. Hence, as we have pointed out, even a reliable paren 
cannot claim any special relevance to intelligence in practic ‘ 
daily-life activities, and is definitely inferior for educational Pee 
dictions. Group tests based on pictures or diagrams likewise measur 


no very clear factors apart from g and specifics, though so 
times showing a small k component when spatial imagery enter 5 

The S factor of American writers is much the same as k, barr 
this area, too, they often break it down into several partially dis 


tinct abilities. Guilford, for instance, finds a Visualisation facto 
which he contrasts 


with Spatial Relations Ability. 
M. Thurstone and others have found a distinct rote memory 
factor in tests invo 


lving the learning and immediate reproduction 
of non-meaningful material 


(digits, nonsense syllables, shapes, 
etc.). This has little or no bearing on school work, and meaning! 
learning is almost wholly a matter of g and v, and specialise 
ability at or interest in the pa 


) rticular subject. X lways 
X. Attainment tests and school marks and examinations atw a 
show closer overlapping with one another than with intelligen 


tests, and this has been attributed by Alexander (1935) to an a 
dustriousness factor, which he calls X. But X is not merely í 
character trait among the pupils; it combines all the ee $ 
such as parental encouragement, good teaching, eter Ne A 
differentiate a pupil’s E.Q. from his 1.Q. Nor can we, m p s 
measure X independently; personality tests and ratings y 
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teachers of the children or of home conditions are not yet 
sufficiently accurate to be of much practical value (cf. Vernon, 
1953). However, it is useful to realise, in educational selection, that 
we do need to assess: g + v + n+ X. Intelligence tests do not 
cover X at all; standardised attainment tests do so to some extent, 
but not so well as the more orthodox type of school examination 
or teachers’ estimates of attainments and promise. 
Formal Factors and Conditions of Work. When ordinary group 
intelligence tests, based mainly on choice-response items, are 
compared with creative-response group or individual tests, the 
former show a distinct factor, presumably representing facility at 
this rather artificial type of question. Very likely it is involved, too, 
in English attainment, reading comprehension, and other new- 
type tests. Probably it is for this reason that school teachers often 
find group test results conflicting with their observations of 
pupils’ abilities and criticise them for measuring ‘slickness’.? 
Actually they do largely depend on g and v, but bring in this 
irrelevant component as well. The Terman-Merrill avoids it. and 
so appears to give a more valid indication of intelligence at school 
and in daily life, although it is less scientifically constructed than 
many group tests. 
_Next we must consider the influence of speed of work in tests 
given with a time-limit. This has been a topic of controversy ever 
Since the first American Army tests were constructed. Ballard 
(1922) criticised their emphasis on speed, and many educationists 
ave expressed similar opinions. Most psychologists, however, 
inding extremely close agreement between the results of speeded 
and unspeeded tests, have disputed this and have followed the 
American model, largely because it is much simpler to invent easy 
han difficult test items and much more convenient administra- 
tively if all testees finish tests at the same moment. Actually, 
almost all group tests involve ‘speed’ and ‘power’. The first half 
th fhe items may be done rapidly and without much thought by 

€ bright pupil, but his g and v mainly determine his capacity at 

c final band of items where he reaches his limit; while the less 

right pupil manages a quarter or less at speed and then needs all 

S g and v to score on some of the middle band. 

Nae are several other reasons, of course, such as the teacher’s pilie, to 
with pene child’s chronological age and his tendency to confuse ability 
character, perseverance and likeability. 
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More recent researches reveal various motor speed factors, or 
quickness at manual and other tasks, also P—perceptual speed (cf. 
p- 86)—but give little support to the notion of a general mental 
speed or slickness, as contrasted with more ‘profound’ ability. 
Rather it appears that certain attitudes to work are involved at 
different levels of difficulty. If a mental test consists entirely of 
rather easy items and the score is based on number correct in a 
given time, some pupils or adults will concentrate mainly on 
speed at the expense of accuracy, some the reverse. On the other 
hand, if the test consists of difficult questions with no time-limit, 
the score will depend as much on persistence at that type of 
material as on intellectual ability (cf. Mangan, 1959). Testing at 
either of these two extremes is not very suitable for intelligence 
measurement or for educational prediction, since the results are 
too greatly affected by the testee’s motivation and attitude to the 
test. Hence the majority of group tests fall roughly in the middle 
range of difficulty. However, in the writer’s judgment our present 
tests for secondary school selection are inclined somewhat too far 
to the Speed v. Accuracy type, and might with advantage in- 
clude more difficult problems and more generous time-limits. 
This may be another reason why the unspeeded Terman-Merrill 
test appears to give such useful results. Similarly, in arithmetic, 
Sutherland (1952) found slightly better predictions of secondary 
school work from tests with longer time-limits. But we should 
not jump to the conclusion that timed tests select the careless and 
untimed the persistent pupil; the attitudes involved in such tests 
are highly complex and probably bear little relation to the atti- 
tudes involved in slipshod and careful school work. And we must 


insist that tests given under these different conditions nevertheless 
measure very largely the same combination of g, v, n or other 
abilities. 


INTELLIGENCE TESTS AND EDUCATION 


_ We are now in a position to understand more clearly the educa- 
tional uses of intelligence tests. A gocd individual test applied 
between 54 and 94, ora thorough group test given from about 
10 onwards, mainly measures the same g and v factors that enter 
into all-round school attainment. We can no longer claim that 
they show innate intelligence or capacity for learning; but since 
they depend on level of concept development and generalise 
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thinking skills they should, in most cases, also give an indication 
of the level of school work that children are capable of tackling. 
And in so far as they are somewhat less affected than are measures 
of attainment by good or poor teaching, environmental handicaps 
or advantages, temperamental factors and emotional adjustment, 
they can be interpreted as showing educational potentiality. At 
the same time we must remember that conditions that improve or 
harm attainment may also quite likely affect intelligence test 
results, even if in lesser degree. 

The correlation between g + » tests and general attainment is 
very high, at least in a primary school age-group—approximating 
0:85. Nevertheless, this allows of considerable discrepanciesin indi- 
vidual cases. Such discrepancies would be expected both because of 
the imperfect reliability of the tests or assessments and because 
each includes some different group factors. The complex of condi- 
tions that we called the X-factor, together with specialised 
abilities at particular subjects, enter into attainment, and the 
irrelevant formal factors, discussed above, often affect intelligence 


tests. (Thus a child’s I.Q. may be lower than his E.Q. if he is 


relatively poor in answering multiple-choice questions at speed, or 
d well taught so that his 


if he is strongly interested in school an nat | 
English and EN TE skills are in advance of his wider thinking 
capacities) Again, if the E.Q. falls markedly below the LQ, this 
does not necessarily show serious retardation (cf. p. 120) attribut- 
able to environmental or temperamental handicaps: the form or 
content of the intelligence test may happen to suit him. In other 
words, differences between ‘intelligence’ and ‘attainments’ are 
very much on a par with differences between arithmetic and 
Teading, or any other highly correlated abilities. They may be a 
sign of special difficulties which require psychological treatment 
or remedial education, but they may also be due to a variety of 
other causes, 

_The greater the influence of other group factors, naturally the 

igger the discrepancies. Thus mechanical arithmetic, handwriting 
ae manual subjects are least well predicted by intelligence tests 

Ccause of their low g + v content (cf. Burt, 1939). For the same 
reason, pictorial, performance, Or abstract non-verbal tests with 
no v-content—while excellent for experimental research pur- 
Poses where it is desired to isolate a non-verbal g factor—are of 
relatively little educational or vocational value. However, there is 
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some evidence that non-verbal group tests may be rather more 
predictive of ability in technical and scientific subjects at the 
-= secondary stage than are verbal tests (cf. p. 90). It is particularly 
unwise to use them as a criterion of educational potentiality, 
except possibly among deaf or non-English-speaking pupils; 
(even among the latter, a verbal test in their mother tongue, plus 
a test of such English as they have already acquired, is likely to 
give a better indication of their capacity for acquiring an educa- 
tion in English), 

Note that this view in no way detracts from the usefulness of 
intelligence tests for selection purposes. The potential grammar 
aa pupil requires high-level thinking skills as well as good 
attainments in the basic subjects. But it does involve some re- 
orientation in current applications of tests for educational guid- 
ance. The clinic or educational psychologist should clearly not 
rely on LQ-E.Q. discrepancy to pick the cases most likely to 
benefit from remedial treatment. The teacher’s subjective judg- 
ment that Johnny ‘could do better’ or is ‘under-functioning’ in 
school, is likely to be as good a guide, though this needs, of 
course, to be supplemented by a thorough investigation of the 
personality, home, schooling and other relevant circumstances. 

In the secondary grammar school, intelligence tests show a muc. 
lower relation to all-round school work, mainly because of the 
greater homogeneity (the high degree of selection) of the pupils, 
though also because specialised talents are developing. They con- 
tinue to provide a useful indication of intellectual level in the 
modern school, where the range of ability is much wider. For the 
same reason they give moderately useful predictions in American 
colleges and universities, with their heterogeneous intakes (cf 
Eysenck, 1947), but show very little correspondence with uni- 
versity success in Britain—correlations of 0-2 to 0-3. The enormous 
majority of British university students obtain IL.Q.s in the 115 tO 
150 range, and Honours graduates score higher on the average 
than Pass degree ones, But Occasional students may achieve 


successful careers with LQ.s down to 100, and there is little 
evidence that tests of the i 


selection of students by 
performance. Nevertheles 
prehension and reasoning 


are generally used in America (cf. p. 25), could certainly be 
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devised if university selecting bodies gave any encouragement to 
the psychologist (cf. Himmelweit and Summerfield, 196). 

It is sometimes argued that British universities could and 
should be greatly expanded because there are so many individuals 
in the general population with I.Q.s as good as those of most 
students, whose talent is being wasted. But the correlation is so 
low that this argument is little better than saying that—since 
university students tend to be taller than average, therefore all men 
of 6 ft, and over should go to university. The supply of suitable 
students depends much more on a combination of the following 
factors than on the I.Q. distribution: 

(i) The educational and vocational aspirations of the family; its 
expectation that the children will undertake an arduous educa- 
tional career and eventually enter high-level jobs, and the material 
and moral support it provides towards these ends. 

(ii) The child’s own drives, interests and ideals. 

(iii) The traditions and current attitudes in the schools the child 
attends, and in society generally, and the prestige of occupations 
Tequiring university training. i 

(iv) The effectiveness of teachers and teaching methods in 
developing favourable attitudes among pupils towards the 
academic subjects and education generally. f 

Modern conceptions of intelligence and its testing also require 
some revision of our notions of streaming, or the segregation of 
brighter and duller pupils (cf. Vernon, 1957). In the 1920s much 
was heard of the tremendously wide range of Mental Ages among 
children in a typical school class, and of the advantages of homo- 
geneous grouping or streaming. It was said that, when children 
“te grouped mainly by age, the bright ones are apt to become 
azy or conceited and the dull ones depressed and resentful. 

Oreover, this diversity increases the higher up the school: ina 
year class about 134 per cent of pupils were intellectually 
equipped to do the work of the classes one year or more 1m 
advance, and 134 per cent capable only of the work of classes 
et younger. But at 12 years, the proportions so misplaced 
Would be likely to reach 29 per cent. i ‘ 

However, grouping by M.A. would obviously involve an 
“normously wide Chronological Age range in each class; there 
Would be serious social difficulties if, say, bright s- or 6-year olds 


. Were taught along with dull 9- or 10-year olds, Hence it was 
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thought better to aim at three streams—the bright, the average and 
the dull—in each age-group; so that any one class would eona 
pupils all of nearly the same age and of a limited range of M.A. 
' For a number of reasons this policy is less favoured nowadays, 
. Particularly in primary schools. 1 
_ 1. Clearly it cannot be applied in the vast number of sma 
schools which take in only 30 to 40 children or less per year. 

2. Inevitably it has bad effects on the morale of the arte 
and their parents, since it is impossible to conceal from the dul cr 
groups that they are regarded as failures; and this inhibits their 
interest in educational progress. : 

3. There is grave danger of any classificatory system becoming 
unduly rigid, and thus neglecting the considerable fluctuations in 
abilities and interests which we have seen to be characteristic © 
mental growth. Even the most successful schemes of secondary 
school selection relegate some 5 per cent of children to modern 
schools who could later have done work of grammar schoo. 
standard and admit another 5 per cent who are relatively unfitted. 
Too often in the primary school, also, children are classified soon 
after entry, and the A’s are brought on so much more rapidly that 
the B’s and C’s soon lose any chance of catching up. We can, how- 
ever, calculate that in such a three-stream primary school, only 
about two-thirds should be expected to stay in the same ability 
stream throughout, and some 10 per cent are likely to merit re- 
classification every year, i.e. to shift up or down one or even two 
streams. s 

Insufficient flexibility in school organisation is at least as serious 
at the bottom as at the top of the ability range. Suppose that it 
were possible to segregate 5 per cent of the most backward at any 
one time in special schools or classes, then we can calculate that 
only about 14 per cent are likely to remain in this subnorma 
group throughout their primary school careers. On the other 


hand, as many as one-quarter of the total would at some stage 1 
their schooling merit inclusion in the backward group. Figures 
such as these clearly point to the desirability of temporary back- 

ward classes, as against the traditi 


A onal policy of removing educa- 
` tionally subnormal and defective children more or less perman- 
ently into special schools. 


4. We have pointed out many times that success in any school 
subject involves other factors besides the g+ v measured by, 
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intelligence tests. There is no justification for the argument that 
Mental Age provides a more fundamental measure of potentiality 
than do other kinds of assessment. Thus, if we must stream, it 
would be better to base it on a combination of(average attain- 
ments, intelligence and teachers’ assessments, just as we do in 
selection at 11-plus, and as is, in fact, usually done in the ascertain- 
ment of E.S.N. pupils. But even if a class is homogeneous in 
general promise, we must expect to find much greater variations 
among its members in particular subjects or in branches of a 
subject. It can never be homogencous for everything. This, of 
course, is often recognised at the secondary level and partially 
met by cross-classification or setting for mathematics, foreign 
languages, etc. pea ; 
Nevertheless, this objection to grouping 1s sometimes exagger- 
ated, Actually the general factors in attainment(g-++ v + X) in the 
primary and secondary modern school tend to be much more 
prominent than group factors. Thus the proposal of the Norwood 
Report to classify children according to their type of ability— 
academic, technical or practical—was totally unrealistic (cf. Burt, 
19434), since success among II-I5 year olds at almost any 
curriculum depends mainly on the same combination of abilities, 
personality traits and home backing. Provided, then, that sufficent 
flexibility can be maintained, class organisation on a basis of 
general suitability is educationally sound. The wise sees aa 
readily cater for special talents within such classes, for ani e by 
using a boy’s relative superiority at handwork, stamp co na 
or social leadership to boost his morale and encourage his applica- 
tion to the more formal subjects in which he is relatively inferior. 
s. Finally, we must ask whether there is any ata that 
streaming within schools, or segregation nto schools at different 


levels of ability, produces superior results to E 
Several American researches indicate that over the whole range 
ble advantages. 


of the school population there are no measura 
At the same time it is virtually certain that the extreme ends of the 


distribution of ability do benefit from teaching by specially trained 
staff, using methods adapted to superior, or to very aie 
pupils. Imbeciles and idiots must clearly be segregated, and the 1 1 
to 2 per cent of higher-grade defectives seem to make better pro- 


gress in E.S.N. schools or special classes than in ordinary schools. 
There is indeed a strong case for expanding backward classes to 
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allow remedial work with, say, the per cent most backward, 
rovided always that transfer to and from such classes is easy. 
‘At the other end of the scale, there is no doubt that grammar and 
^ public school pupils make better progress in segregated schools 
or segregated streams within the comprehensive school. Such 
advanced subjects as mathematics, English grammar, classics or 
history can probably be tackled with profit only by the 10 to 15 
per cent most able pupils. But the stage at which segregation 
should occur, and the most appropriate type of school organisa- 
tion which will cater for this group, without denying oppor- 
tunities to later developers, are matters of educational values and 
policy rather than of psychology and mental measurement. The 
most we can infer from psychometric evidence is that there should 
probably be less rather than more streaming within secondary, 
and even less within primary, schools. It is likely that the full 
tealisation of each individual’s potentialities can be better 
brought about by extending individual and small-group work 
within a heterogeneous class than by class-teaching of 30, 40 OF 
more children a belonging to a restricted range of ability. 


INTELLIGENCE TESTS AND VOCATIONAL SUITABILITY 


When intelligence tests are applied to adults in various jobs, 
very considerable differentiation may be observed according to 
the degree of intellectual ability and the length of training that 
the jobs involve, Fig. 8 shows some results collected for 10,000 
Army recruits in 1939-40 by Himmelweit and Whitfield (1944): 
using a fo-minute verbal test. Scores have been converted to 
equivalent I.Q.s by the writer, Each occupational group includes 
120 or more cases. The middle line in each bar represents the 
median score, and the ends of the bar the goth and roth per- 
centiles. For example, all but the top and bottom 10 per cent 0 
those who were labourers in civil life fall between I.Q. 73 and 106. 
Now, while the average school-teacher obtainsa much higher I.Q. 

the average labourer, it will be seen that there is considerable 
overlapping, Indeed, some of the highest 10 per cent of labourers 
are apparently more intelligent than some of the lowest ro per cent 
T of teachers. It is likely that the professional persons of lowish 
intelligence would be rather slow in passing their courses, an 
that the highly intelligent labourers may be ‘ussatithied with the 
intellectual content of their jobs. But researches dealing with 


a 
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particular occupations show that, within limits, too low (or too 
high) intelligence is not necessarily a sign of incapacity or in- 
efficiency. In other words, a school-leaver’s LQ. will give us a 
general picture of the level of job for which he is likely to be 
suited. But the correlation with success in any one job is usually 
too slight for any accurate prediction. Again, intelligence tests 
alone show nothing about the relative suitability of different 
types of job—involving different specialised abilities and interests 


—at any one level. 
tsome groups, such 


Another qualification is needed. The fact that so: | 
as teachers, generally show above average intelligence, does not 


prove that people could not do this work with a lower score. 
Entry to the profession is restricted mainly to young people with 
successful secondary school records. Also teaching is a common 
choice for the bright leaver who has no strong talents or inclina- 
tions for other careers, ot little o portunity for gaining a foothold 
in them. Thus we do not really know to what extent the present 
distribution of teacher I.Q.s is artificially boosted. Other jobs, for 
similar reasons, may be failing to attract as good a range O 


intelligence as they need. K. 
Nevertheless, we have shown elsewhere (Vernon and Party, 
useful indications, 


1949) that intelligence tests do sometimes give ) 
erona suitability. First, they correlate moderately well with 
Proficiency at clerical jobs, or other work that involves dealing 


4 
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with words and figures. They are least related to jobs where the 
work is highly specialised (e.g. radio), where manual skills are 
„involved, or where personality qualities are of greater importance 
than intellectual skills or knowledge (e.g. business or teaching). 
Secondly, they are very valuable when, as in the Armed Forces, 
rapid trainability, and in particular the capacity to learn mechanical, 
electrical or other theory, are important. Thirdly, much depends 
on the degree of heterogeneity of the candidates. If these are pre- 
selected on educational or other grounds, intelligence tests have 
little to contribute. But when, again as in the Services, a very wide 
range of ability occurs, they certainly help in allocating brighter 
recruits to more skilled employments. In civilian life they can, for 
example, be given with advantage to intending nurses, who may 
not only be very heterogeneous in background but also have a 
lot of theory to learn. At the same time they will predict success 
only moderately well, since personality qualities obviously play 
such a large part in good nursing. In general, tests should certainly 
e used in vocational guidance of modern school-leavers (the 
11-plus results are not adequate at 14+), but are less necessary W1 
grammar school-leavers whose academic record is known. 
A surprising finding during the war was that the purer non- 
yea tests of g were generally less useful in predicting occupationa 
abilities than tests with an obvious educational bias—that is, tests 
of g + v. Indeed, a test of arithmetic and mathematics was the 
most useful of all, not only for clerical and technical jobs but f OF 
predicting all-round efficiency. Spatial tests and mechanica 
£0 mprehension and information (k:m) certainly had a contribu- 
ae to make in the selection of mechanics, but in almost all jo9s 
e main requirement was that same complex of qualities a 
grammar schools look for at t1-plus (cf. p. 181). More specialise 
allo of group factors, indeed, were seldom of much value i” 
Ocating recruits to suitable jobs, when a very wide variety © 
possibilities existed. The same is likely to be true in vocation: 
gvidance of school-leavers, Vocational selection is different. Whe? 
He et wishes to pick the best candidates for one Pat” 
pe eo > i should proceed to analyse the aptitudes and qua Ge 
7 B and test them by an appropriately designed battery © 
ests. But the usefulness of tests in guidance has often been ex- 
aggerated. General intelligence and educational level sho be 


determined; supplementary mechanical-spatial tests (for boys) a” 
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clerical tests are often worth adding. But much more attention 
should be paid to interests, relevant experience and opportunities, 
home background, physical assets and weaknesses and personality 
qualities. 

A final caution may be in place against over-valuating intelli- 
gence as measured by tests. Because this book has attempted to 
answer misinformed or prejudiced critics, it has naturally stressed 
such evidence as that of Terman and the high correlations with 
secondary school success. But most of the world’s work is carried 
out by men and women of around average intelligence. The 
intelligent adult or child is not, as some people seem to suspect, 
socially undesirable in any way that we can discern, but he is also 
not the only desirable citizen. 
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APPENDIX: ANSWERS TO EXAMPLES 


. elbow 

- 23 30 
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. OPPOSITE 
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. avoid 

. nightingale 

- perfunctory meticulous 
fish 
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. a door 

. Fireplace 

- GHV 

- the fourth figure 

- cottonwool flour 

- the third figure 

+ second minute 
32 

» the third figure 

. sun 

- 2 Stephen 
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- sacrifice object 

. FALSE 
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. SILLY 
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. north 
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summer winter 
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the fourth figure 
the first figure 
Summer (or New Year 


Christmas) 


+ 7643 
. foal 
. sun 

54. 
55. 
56. 
57- 
58. 
. 24 


381191 

No. 4 

the second figure 
the fourth figure 
the third figure 


Na I, 6, 8 squares 

2, 4, 5 upright rect- 
angles 
37,9 adero 
ctangles 

Ta X a s unshaded 
2, 6, 9 half- aded 
4, 7 shaded 

the second answer 

‘4771 


. mediocre 
. were 


gave 


. COW 
. audacious 
77. 


wheels 


78-797 Any complete, gram- 


80. 


81, 


matical sentences 

was stolen from the larder 
by Jim 

if she could go 
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